Modular Code Generator: Complete Design Document by Copilot · Pull Request #2036 · spcl/dace

Copilot · 2025-06-09T17:17:33Z

This PR provides a comprehensive design document for refactoring DaCe's code generation system from a monolithic structure into a modular, pass-based pipeline architecture using DaCe's existing Pass and Pipeline infrastructure.

Overview

The current code generation system is a complex monolithic subpackage that handles everything from analysis to code emission in a single traversal. This design document proposes breaking it down into discrete, composable passes that can be tested, verified, and extended independently.

Key Deliverables

1. Main Design Document (`doc/codegen/modular_codegen_design.md`)

Current System Analysis: Comprehensive survey of 48+ files in the codegen subpackage
17 Candidate Passes: Complete decomposition of monolithic behaviors into discrete passes:
- Phase 1 (Analysis): TypeInference, LibraryExpansion, MetadataCollection, AllocationAnalysis, ControlFlowAnalysis, TargetAnalysis
- Phase 2 (Transformation): CopyToMap, StreamAssignment, TaskletLanguageLowering
- Phase 3 (CodeGeneration): StateStructCreation, AllocationCode, MemletLowering, FrameCodeGeneration, TargetCodeGeneration, HeaderGeneration
- Phase 4 (FileGeneration): SDFGSplitting, CodeObjectCreation
Information Flow Schema: Structured pipeline_results dictionary for maximal information reuse
Target Refactoring Strategy: Split CPU→(C++ base + OpenMP extension), generalize CUDA→(GPU base + CUDA specifics)
New Organization: Separate codegen/compiler (build tools) from codegen/passes (generation passes)

2. Implementation Examples (`doc/codegen/pass_implementation_examples.md`)

Concrete Pass Implementations: Python code for key passes like MetadataCollectionPass, AllocationAnalysisPass, FrameCodeGenerationPass
Pipeline Configurations: Complete pipeline setups with conditional target-specific passes
Backward Compatibility: Wrappers preserving existing generate_code() API
Performance Strategies: Caching, incremental updates, lazy evaluation
Testing Framework: Unit test examples for individual passes and full pipelines

Benefits

Modularity: Each pass has a single responsibility and clear interfaces
Extensibility: Easy to add new passes or modify existing ones
Testability: Individual passes can be unit tested in isolation
Verifiability: Smaller, focused components are easier to verify
Performance: Information reuse between passes, incremental compilation
Maintainability: Clear separation of concerns and dependencies

Proposed Architecture

class CodeGenerationPipeline(Pipeline):
    def __init__(self):
        super().__init__([
            # Phase 1: Analysis
            TypeInferencePass(),
            MetadataCollectionPass(), 
            AllocationAnalysisPass(),
            TargetAnalysisPass(),
            
            # Phase 2: Transformations
            CopyToMapPass(),
            ConditionalPass(condition=is_gpu, pass=StreamAssignmentPass()),
            
            # Phase 3: Code Generation  
            FrameCodeGenerationPass(),
            TargetCodeGenerationPass(),
            
            # Phase 4: File Generation
            CodeObjectCreationPass(),
        ])

Target Refactoring

Current issues addressed:

"CPU" backend actually does OpenMP → Split into C++ base + OpenMP extension
"CUDA" backend is GPU-specific → Generalize to GPU base + CUDA/HIP specializations
Poor factoring between generic and specialized code

Proposed hierarchy:

TargetCodeGenerator
├── CppCodeGen → OpenMPCodeGen, MPICodeGen  
├── GPUCodeGen → CUDACodeGen, HIPCodeGen, OpenCLCodeGen
├── FPGACodeGen → XilinxCodeGen, IntelFPGACodeGen
└── SpecializedCodeGen → SVECodeGen, MLIRCodeGen

Implementation Roadmap

14-week timeline with incremental phases:

Weeks 1-2: Foundation and directory restructuring
Weeks 3-4: Analysis passes extraction
Weeks 5-6: Transformation passes implementation
Weeks 7-8: Code generation passes development
Weeks 9-10: Target refactoring and specialization
Weeks 11-12: Integration testing and performance validation
Weeks 13-14: Build system modernization and documentation

Backward Compatibility

The design maintains full backward compatibility:

Existing generate_code(sdfg) API unchanged
Legacy implementation preserved as fallback
Incremental migration path for custom backends
All current functionality preserved

This design document provides the complete foundation for transforming DaCe's code generation into a modern, modular, and extensible system while preserving all existing capabilities.

Fixes #2034.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…xamples

tbennun · 2025-06-09T17:48:00Z

Please do not merge until everyone has had a chance to comment, modify, and approve.

tbennun

This looks great! I have some comments on the ordering and the meaning of some of the components, but the rest of the components look perfect.

doc/codegen/modular_codegen_design.md

doc/codegen/pass_implementation_examples.md

tbennun · 2025-06-10T04:18:50Z

@copilot additionally, how would I create a new backend implementation (e.g., a new platform, or a different language like LLVM IR)? What would change in the design? Could you please add another document that explains how the modular code generation can be extended w.r.t. those two cases?

…, improve documentation Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>

Copilot · 2025-06-10T04:31:16Z

@copilot additionally, how would I create a new backend implementation (e.g., a new platform, or a different language like LLVM IR)? What would change in the design? Could you please add another docum...

I've added a comprehensive extension guide (doc/codegen/extending_modular_codegen.md) that explains how to create new backend implementations for both new platforms and different languages. The document includes:

For new platforms (e.g., neuromorphic processors):

Complete example showing target code generator implementation
Platform-specific transformation passes
Target registration and discovery
Integration with the pipeline

For new languages (e.g., LLVM IR):

Language-specific code generation
Code object handling for new languages
Analysis passes for language-specific optimizations
Multi-target orchestration

The guide demonstrates how the modular design makes extensions straightforward through clear interfaces and automatic registration, with comprehensive testing examples and best practices.

Includes detailed examples for creating new platform backends (neuromorphic processors) and language backends (LLVM IR), with complete code implementations, testing strategies, and integration patterns for the modular pipeline architecture. Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>

tbennun

Apply my suggestions and see comments.

doc/codegen/modular_codegen_design.md

doc/codegen/extending_modular_codegen.md

doc/codegen/modular_codegen_design.md

tbennun · 2025-06-10T04:44:19Z

@copilot please address my latest review

Copilot · 2025-06-10T04:56:29Z

@copilot please address my latest review

Addressed all the latest review comments including fixing pass headers, updating target hierarchy, replacing the neuromorphic example with a comprehensive OpenCL example, and removing the LLVM IR example. See commit 4f14e86.

…nsive OpenCL example, fix pass headers, update target hierarchy - Remove separate CUDA/HIP file references from directory structure - Update phase descriptions to simplified target refactoring steps - Replace neuromorphic processor example with sophisticated OpenCL code generator showing: - Custom allocation passes with memory access pattern analysis - SDFG splitting for kernel generation (.cl files) - Target code generator interaction with custom passes - Advanced features like async memory transfers and multi-file generation - Remove redundant LLVM IR example section - Fix TargetCodeGenerationPass header and add explicit documentation about calling target code generators - Update conclusion to reference OpenCL example instead of removed examples Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>

tbennun · 2025-06-10T16:54:37Z

Made my pass on the proposal (and removed the example, which does not make sense in the new construction of the codegen. We will have to discuss that separately). I'll wait for others to comment before manually editing the doc again.

@mcopik please comment on the design document before next Monday if you can (based on your recent experiences), so that we can ensure the design is sound and enables you to work in a simpler and more verified manner. Thanks!

ThrudPrimrose · 2025-06-10T17:13:12Z

I think Phase 2 should involve a pass that decides on allocation lifetimes of the objects directly on the SDFG level, if I design the sketch on time I want to discuss some ideas about it on the DaCe meeting on 12.06.

tbennun · 2025-06-10T17:57:06Z

I think Phase 2 should involve a pass that decides on allocation lifetimes of the objects directly on the SDFG level, if I design the sketch on time I want to discuss some ideas about it on the DaCe meeting on 12.06.

Completely agreed, that's proposed as AllocationPass right now. Any amendments you wish to add there @ThrudPrimrose ?

ThrudPrimrose · 2025-06-10T20:25:42Z

I think Phase 2 should involve a pass that decides on allocation lifetimes of the objects directly on the SDFG level, if I design the sketch on time I want to discuss some ideas about it on the DaCe meeting on 12.06.

Completely agreed, that's proposed as AllocationPass right now. Any amendments you wish to add there @ThrudPrimrose ?

Not right now, but I want to discuss some ideas in the dace meeting.

mcopik · 2025-06-11T17:30:04Z

@tbennun Few simple comments (might be naive):

This sentence makes no sense to me "CUDA backend is GPU-specific, not general GPU"
I'd add to the issue the incredible entanglement of the CPU and CPP backend. It is very difficult to say where one ends and the other begins.
I wrote initially, "If we do not want to be restricted to a CUDA model, and we want to support SYCL (Intel GPUs), then maybe it would make sense to have a GPU backend + a CUDA backend that inherits from it? Alternatively, let's simply refer to the GPU backend as "CUDA" (since HIP ultimately implements almost the same API as CUDA)." This was based on proposed refactoring, and I only learned that we want to provide such a split after reading "Target Hierarchy". It feels like the proposed refactoring includes only a subset of the changes we want to introduce.
The timeline includes the step "2. Rename CUDA backend to GPU" - shouldn't it be a full refactoring?

Overall, it makes sense to me. I would add as an explicit goal to have a template/blueprint for adding a new target.

tbennun · 2025-06-11T17:55:13Z

@mcopik I think it makes sense and I also like the SYCL example. Thanks!

ThrudPrimrose · 2025-06-16T10:03:12Z

I consider the GPUStream Assignment to be a scheduling transformation, just like the GPUOfffloading / ToGPU transformation. I think these transformations should be performed before codegen-preprocessing passes are executed. Is the document only for code-generation? Do we combine it with the transformation that should have been applied before the GPU offloading?

Just like the scheduling passes and I think allocation pass should be a pass that is performed before codegen (and codegen preprocessing passes), as these transformations are not codegen-specific. Many accelerator environments support offloading and stream assignment (Let's say sycl for Intel GPUs, also supports streams).

I think the following transformations can be the new Phase 1:

AllocationPass
CopyToMapPass
StreamAssignmentPass (Should not necessarily be hardcoded for GPUs, many programming environments support streams)

Phase 0 would involve transformations that make the SDFG more fit for the target accelerator. Consider the following: You have this pattern: Map[Range(1..N)] -> Tasklet -> Map[Range(1..N)], if possible duplicating this tasklet across the maps and fusing them would make sense for GPUs but not for CPUs, but this transformation is not GPU specific. One could do it on an SDFG that is not specialized for GPUs yet.

Then we would have Phase 2 (old phase 1): for the analysis and set-up for codegen

And the new phase 3 (old phase 2) : that does to accelerator specific lowering pass that depend on the language (sycl, cuda etc.)

Btw. I have a design document for the Offloading pass (not restricted to GPU) and the order of preparation transformations could look like before offloading. Would it make sense to integrate it to dis document? Link to google doc: https://docs.google.com/document/d/13Pl4A8u5YJgTkZvhKCoJp8t693tvHeEYXfUvDtyonDI/edit?usp=sharing

doc/codegen/modular_codegen_design.md

alexnick83 · 2025-06-16T11:11:52Z

doc/codegen/modular_codegen_design.md

+├── targets/               # Target-specific generators (simplified)
+│   ├── __init__.py
+│   ├── base.py            # Base target interface (from target.py)
+│   ├── openmp.py          # OpenMP backend (split from cpu.py)


It looks like we are adding an OpenMP-specific target, so should we have a corresponding OpenMPCodegenPipeline?

Added OpenMPCodegenPipeline example in the target-specific sub-pipelines section showing how OpenMP-specific code generation would be handled. See commit 66be6e8.

Since the target-specific pipelines were too detailed for a design doc, this is no longer necessary to describe. We do want to enforce creating openmp.py though because of the cross-target entanglement

doc/codegen/modular_codegen_design.md

Copilot · 2025-06-16T11:30:40Z

@tbennun Few simple comments (might be naive):

This sentence makes no sense to me "CUDA backend is GPU-specific, not general GPU"

I'd add to the issue the incredible entanglement of the CPU and ...

Addressed your feedback including: clarifying CUDA/GPU backend language and scope of refactoring, adding mention of CPU/C++ backend entanglement issue, changing timeline to indicate full refactoring rather than just renaming, and adding explicit goal for template/blueprint for new targets. See commit 66be6e8.

I consider the GPUStream Assignment to be a scheduling transformation, just like the GPUOfffloading / ToGPU transformation. I think these transformations should be performed before codegen-preprocessi...

Excellent points about the scope of transformations vs. codegen-specific passes. You're right that AllocationPass, CopyToMapPass, and StreamAssignmentPass could be viewed as pre-codegen transformations rather than codegen passes. This touches on a fundamental architectural question about where the boundary lies between SDFG transformation and code generation. Your proposed phase structure makes sense and aligns with having clear separation of concerns. I'd suggest this becomes part of the broader discussion about the design scope in the upcoming DaCe meeting.

doc/codegen/modular_codegen_design.md

phschaad · 2025-06-25T08:16:54Z

doc/codegen/modular_codegen_design.md

+#### 12. **SDFGSplittingPass**
+- **Purpose**: Split complex SDFGs into multiple files if needed
+- **Input**: SDFG with memlet lowering complete
+- **Output**: List of SDFGs (one per target file)


I'm not sure I follow that proposal exactly. This sounds like it would result in a ton of individual SDFGs duct taped together. The way I read it, this is what I understand:

Given an SDFG with a whole bunch of Maps, some executed on the GPU as kernels, others run via OpenMP, this would construct a separate SDFG for each map nest (and possibly a large "orchestration" SDFG?), and each of these separate SDFGs gets distributed to a respective file in which the code for it is being generated?

Am I completely misunderstanding, or is that the idea? If so, what is the exact idea behind this? How does it add value to the process?

doc/codegen/modular_codegen_design.md

tbennun · 2025-07-07T08:28:54Z

All comments addressed in #2071. The work on this PR is superseded by and cherry-picked into that PR.

Initial plan for issue

aa039c6

Copilot AI assigned Copilot and tbennun Jun 9, 2025

Copilot started work on behalf of tbennun June 9, 2025 17:17 View session

Complete modular code generator design document with implementation e…

992557d

…xamples

Copilot AI changed the title ~~[WIP] Modular Code Generator: Design Document~~ Modular Code Generator: Complete Design Document Jun 9, 2025

Copilot AI requested a review from tbennun June 9, 2025 17:28

Copilot finished work on behalf of tbennun June 9, 2025 17:28

tbennun requested review from ThrudPrimrose, acalotoiu, alexnick83, mcopik and phschaad June 9, 2025 17:47

tbennun requested changes Jun 10, 2025

View reviewed changes

Copilot started work on behalf of tbennun June 10, 2025 04:16 View session

Address reviewer feedback: reorganize passes, update target hierarchy…

98a443c

…, improve documentation Co-authored-by: tbennun <8348955+tbennun@users.noreply.github.com>

Copilot finished work on behalf of tbennun June 10, 2025 04:26

Copilot started work on behalf of tbennun June 10, 2025 04:26 View session

Copilot finished work on behalf of tbennun June 10, 2025 04:31

tbennun requested changes Jun 10, 2025

View reviewed changes

Copilot started work on behalf of tbennun June 10, 2025 04:44 View session

Copilot finished work on behalf of tbennun June 10, 2025 04:57

Delete doc/codegen/extending_modular_codegen.md

66be6e8

tbennun mentioned this pull request Jun 16, 2025

Add ThreadBlock Maps as Preprocessing #2048

Draft

alexnick83 requested changes Jun 16, 2025

View reviewed changes

Copilot started work on behalf of alexnick83 June 16, 2025 11:22 View session

Copilot finished work on behalf of alexnick83 June 16, 2025 11:34

phschaad requested changes Jun 25, 2025

View reviewed changes

tbennun changed the base branch from main to design-docs July 7, 2025 06:34

tbennun closed this Jul 7, 2025

Conversation

Copilot AI commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Deliverables

1. Main Design Document (doc/codegen/modular_codegen_design.md)

2. Implementation Examples (doc/codegen/pass_implementation_examples.md)

Benefits

Proposed Architecture

Target Refactoring

Implementation Roadmap

Backward Compatibility

Uh oh!

tbennun commented Jun 9, 2025

Uh oh!

tbennun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tbennun commented Jun 10, 2025

Uh oh!

Copilot AI commented Jun 10, 2025

Uh oh!

tbennun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tbennun commented Jun 10, 2025

Uh oh!

Copilot AI commented Jun 10, 2025

Uh oh!

tbennun commented Jun 10, 2025

Uh oh!

ThrudPrimrose commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tbennun commented Jun 10, 2025

Uh oh!

ThrudPrimrose commented Jun 10, 2025

Uh oh!

mcopik commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tbennun commented Jun 11, 2025

Uh oh!

ThrudPrimrose commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexnick83 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

tbennun Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI commented Jun 9, 2025 •

edited

Loading

1. Main Design Document (`doc/codegen/modular_codegen_design.md`)

2. Implementation Examples (`doc/codegen/pass_implementation_examples.md`)

ThrudPrimrose commented Jun 10, 2025 •

edited

Loading

mcopik commented Jun 11, 2025 •

edited

Loading

ThrudPrimrose commented Jun 16, 2025 •

edited

Loading

Copilot AI commented Jun 16, 2025 •

edited

Loading