Skip to content

Conversation

ilya-biryukov
Copy link
Contributor

@ilya-biryukov ilya-biryukov commented May 27, 2025

Instead of using ASTUnit and tooling APIs, directly mimic what Clang does during the compilation.

Run the Clang frontend until the translation unit parsing is done, at which point switch back to Carbon to finish the Check phase and interface with Clang through ASTContext and Sema. In lower, finish the corresponding Clang compilation phase, i.e. CodeGen.

Clang does not have the corresponding APIs and instead provides a callback-based mechanism. To map this back to Carbon APIs, we create a separate thread that gives control back to Carbon through the callbacks, and then finishes the compilation when the Carbon code is done.

Although essentially a hack, this allows to easily fit into the Carbon codebase quickly and we should have an option of refactoring Clang code in LLVM upstream if this approach proves fruitful.

Replace this paragraph with a description of what this PR is changing or
adding, and why.

Closes #ISSUE

@danakj
Copy link
Contributor

danakj commented May 27, 2025

Run the Clang frontend until the translation unit parsing is done, at which point switch back to Carbon to finish the Check face

Did you mean Check phase?

@ilya-biryukov
Copy link
Contributor Author

This is very raw: tests fail, an approach to codegen needs to be updated, the documentation and PR description needs to be improved.

Still posting this to get early feedback about the feasibility of this approach, especially the dance with multiple threads.
Sharing the same CompilerInstance, AST, Sema, etc between multiple threads is not unheard of, but, like other use-cases (e.g. clangd), it's unusual in its own way. LLVM should be prepared to handle that, though. Even if there are few global variables / thread locals left, they should be easily fixable.

@bricknerb bricknerb self-requested a review May 27, 2025 18:32
@ilya-biryukov
Copy link
Contributor Author

Did you mean Check phase?

🤦 I did, thanks for pointing that out.

@ilya-biryukov
Copy link
Contributor Author

I've picked this up again and it's shaping up now, should be close to something I'd like to land.
In particular, we now use the code generator from Clang rather than creating our own and take the llvm::Module Clang produces.

I want to do one last polish of the code before sending this out for review, but it's now roughly what I want it to be in terms of behavior. One last bit that I want to do a little later is to get rid of the use of CodeGenerator for marking which functions are used and instead go through Clang frontend interfaces. But since it's pretty independent, I am thinking of doing it as a follow-up.

@ilya-biryukov
Copy link
Contributor Author

The code is now final, this is one rebase away from moving off the "draft" state.

@ilya-biryukov ilya-biryukov force-pushed the in_flight_clang branch 3 times, most recently from 7de0bf0 to 8f31d64 Compare July 4, 2025 15:07
// Flags to match LLVM IR output of default code generation options, which
// was used in an earlier version.
// TODO: remove them when sharing flags with ClangRunner.
"-fomit-frame-pointer", "-fno-unwind-tables",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really unsure if I should be adding this and it should definitely go away after the TODO about sharing with ClangRunner is finished.

Happy to take either path here and not sure which defaults Carbon uses now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's a little overlap between this and #5767. Sorry about that; I was hoping the parts I was touching were more separate from the parts you were touching here.

I think we can fully separate them if we make this PR do just a little bit less: instead of having InFlightClang take driver arguments and run the driver to from frontend arguments, could you make it take a shared_ptr<CompilerInvocation> instead, and build that here using clang::CompilerInvocation::CreateFromArgs, passing in the same frontend argument list we currently do?

With that done, we can deal with using the driver to map from driver arguments to frontend arguments in #5767 (including plumbing through the proper path to the clang binary so that we find the resource directory properly), and it should be straightforward to combine the two patches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your question about keeping output the same: if you want to decouple this PR from changes to the output, one way to do that would be to overwrite the CodeGenOptions in the CompilerInvocation with a default-constructed CodeGenOptions for now (and just set the EmitVersionIdentMetadata flag to false). For things that we want full control over, changing them via CompilerInvocation seems better than trying to control them with driver arguments.

Instead of using ASTUnit and tooling APIs, directly mimic what Clang
does during the compilation.

As a result, we now correctly generate definitions for inline C++
functions called by Carbon and functions called recursively by them.

Run the Clang frontend until the translation unit parsing is done, at
which point switch back to Carbon to finish the `Check` face and
interface with Clang through `ASTContext` and `Sema`. In lower, finish
the corresponding Clang compilation phase (CodeGen) and consume the
resulting `llvm::Module`.

Clang does not have the corresponding APIs directly and instead provides
a callback-based mechanism. To map this back to Carbon APIs, we create a
separate thread that gives control back to Carbon through the callbacks,
and then finishes the compilation when the Carbon code is done.

Although essentially a hack, this allows to easily fit into the Carbon
codebase quickly and we have an option of refactoring Clang code in LLVM
upstream in the future to do this directly if this approach proves
fruitful.
@@ -112,7 +112,6 @@ fn MyF() {
// CHECK:STDOUT: target triple = "x86_64-unknown-linux-gnu"
// CHECK:STDOUT:
// CHECK:STDOUT: define void @_CMyF.Main() !dbg !7 {
// CHECK:STDOUT: entry:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why entry: got removed here (because of -fno-exceptions maybe?). It also doesn't really matter at this point, probably.

@ilya-biryukov ilya-biryukov marked this pull request as ready for review July 4, 2025 15:14
@ilya-biryukov
Copy link
Contributor Author

This is now ready for a proper review.

Copy link
Contributor

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several issues found with pre-commit (also flagged in the CI failures), and some other C++ style issues that I think clang-tidy checks should catch like naming conventions not matching Carbon code, and generally following the Carbon style guide.

The biggest other issue that jumps out at me on the initial pass is in the comment below -- the system for managing all of this should have some substantial comments explaining how it works and how to use it.

Last but not least, we'll want to resolve how to layer this PR and the one that just went out in the same area: #5767

#include "llvm/ADT/IntrusiveRefCntPtr.h"

namespace Carbon {
class InFlightClang {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this really should have some detailed class comments to explain how it works and what it does?

@chandlerc
Copy link
Contributor

Looking at #5767, it goes seem to duplicate some of the better calling of clang that is here, but not all of it. Notably, moving it into a separate thread to manage stack size etc seems really nice, but also a bigger change. And there are a bunch of related fixes in 5767 too, so my suggestion would be to rebase this PR on that one if not too much trouble. Does that make sense?

@chandlerc
Copy link
Contributor

Looking at #5767, it goes seem to duplicate some of the better calling of clang that is here, but not all of it. Notably, moving it into a separate thread to manage stack size etc seems really nice, but also a bigger change. And there are a bunch of related fixes in 5767 too, so my suggestion would be to rebase this PR on that one if not too much trouble. Does that make sense?

(Sorry, I didn't see Richard's comments on one of the threads in the files that crossed path with my looking at this -- happy for you and Richard to figure out whatever approach makes sense to you all.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants