Skip to content

[WIP] New code generator design#2035

Closed
Copilot wants to merge 1 commit intomainfrom
copilot/fix-2034
Closed

[WIP] New code generator design#2035
Copilot wants to merge 1 commit intomainfrom
copilot/fix-2034

Conversation

Copy link
Contributor

Copilot AI commented Jun 9, 2025

Thanks for assigning this issue to me. I'm starting to work on it and will keep this PR's description up to date as I form a plan and make progress.

Original issue description:

We are interested in refactoring code generation to become a series of passes.

Code generation is already built as a series of passes, but is a complex monolithic subpackage of DaCe. The goal is to turn the final code generation into a simpler traversal process, so that it is more modular, extensible, and verifiable.

The current code generation passes (in the monolithic structure) are:

  • Special validation passes before code generation
  • Metadata collection (free symbols, sub-SDFG argument lists, etc.)
  • Allocation scope determination (i.e., where a data container's memory will actually be allocated/deallocated based on lifetime and scope rules).
  • Creation of the State struct for the SDFG program
  • Copy-to-Map pass (only in certain backends)
  • GPU Stream assignment pass (only in the cuda backend)

Followed by traversal that both emits code for memory copies, allocation/deallocation, scopes, tasklets, functions for certain scopes and nested SDFGs (where FPGA backends are even more complex), and every node. See docs/codegen/codegen.rst for more information.

We would like to use the Pass and Pipeline classes that DaCe provides to simplify the process. The goal is for passes to gradually add metadata to the SDFG elements and to the pipeline_results dictionary that pass pipelines provide, gradually lowering the SDFG to a more explicit SDFG (e.g., where copies become tasklets at the right scope, memory allocations/deallocations become tasklets, and Python or other language tasklets become their target language tasklet, i.e., C++/CUDA/HIP/OpenCL/RTL...), then to a list of SDFGs (one per generated code file), and finally to a GenerateCode simple traversal pass that emits the given code.

Lastly, the code generation pipeline is over-specialized right now and not well factored. The "CPU" code generation should actually be the "OpenMP" code generator, and the non-OpenMP code should move to "C++" code generation instead. Same goes for CUDA, which should be the GPU code generator.

To do that, the task list is:

  1. Generate a design document by scouring the entire code generation subpackage and create a list of candidate passes that covers all possible behaviors
  2. Construct an abstract pipeline in which all the passes connect to each other with maximal information reuse to improve performance.
  3. The codegen subfolder needs to separate into codegen/compiler for compiler (cmake, etc.) interaction and codegen/passes for code generation-related passes. This should also allow the CMake backend to be replaced with direct compiler calls, which can be faster, and generation of other output languages that are not C++.

cc @acalotoiu @ThrudPrimrose @alexnick83 @phschaad

Fixes #2034.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@tbennun tbennun closed this Jun 9, 2025
@tbennun tbennun deleted the copilot/fix-2034 branch June 9, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Modular Code Generator: Design Document

2 participants