Skip to content

Conversation

@aengelke
Copy link
Contributor

Add a mechanism to permit plugins running code between optimizations and
the back-end passes.

The primary motivation for this point is TPDE-LLVM, which substitutes
the LLVM back-end (optionally falling back to it for unsupported IR). We
have been distributing a Clang patch; but requiring a custom-build
toolchain is impracticable for many users.


I'm not sure whether this is the best way to achieve the goal. Front-end
plugins are not viable. Any kind of pass manager plugin is not an
option, as the entire pass pipeline needs to be (optionally) skipped.

Tests are currently missing; I will add them once there's consensus that
this is the way to go.

Created using spr 1.3.5-bogner
@nikic
Copy link
Contributor

nikic commented Oct 27, 2025

I'm okay with this, though I believe @weliveindetail was hoping for something that works cross-frontend. But probably this is a better starting point.

@vgvassilev
Copy link
Contributor

Can you clarify: do you use a plugin to swap clang's CodeGen and why the standard plugin system does not work well there?

@aengelke
Copy link
Contributor Author

aengelke commented Oct 27, 2025

do you use a plugin to swap clang's CodeGen

We swap LLVM's CodeGen (the machine code generation part), but want to keep Clang's CodeGen (the part that generates LLVM IR and optimizes that). We want the front-end and middle-end optimizations to behave as usual and only replace the back-end with our own back-end.

why the standard plugin system does not work well there?

The standard front-end plugins permit replacing, e.g., the CodeGenAction with something else. But we do want Clang's CodeGen to work normally. CodeGenAction and emitBackendOutput are very inflexible; adding a custom back-end through a PluginASTAction would essentially require us to copy large parts of CodeGenAction.cpp and BackendUtil.cpp, which is not maintainable.

@vgvassilev
Copy link
Contributor

do you use a plugin to swap clang's CodeGen

We swap LLVM's CodeGen (the machine code generation part), but want to keep Clang's CodeGen (the part that generates LLVM IR and optimizes that). We want the front-end and middle-end optimizations to behave as usual and only replace the back-end with our own back-end.

why the standard plugin system does not work well there?

The standard front-end plugins permit replacing, e.g., the CodeGenAction with something else. But we do want Clang's CodeGen to work normally. CodeGenAction and emitBackendOutput are very inflexible; adding a custom back-end through a PluginASTAction would essentially require us to copy large parts of CodeGenAction.cpp and BackendUtil.cpp, which is not maintainable.

Would it make more sense to make the CodeGenAction and emitBackendOutput more flexible? Maybe checking if the BackendAction kind is plugin we can give back the control to the frontend plugin code...

It would be great to enhance the existing clang plugin system rather than adding a new plugin extension point.

cc: @lhames.

@vgvassilev
Copy link
Contributor

PluginASTAction would essentially require us to copy large parts of CodeGenAction.cpp

Do you mean that you will need to wrap clang's BackendConsumer into a MultiplexConsumer and overwrite its HandleTranslationUnit swapping BackendConsumer::HandleTranslationUnit's emitBackendOutput?

@aengelke
Copy link
Contributor Author

Would it make more sense to make the CodeGenAction and emitBackendOutput more flexible?

Maybe? emitBackendOutput (EmitAssemblyHelper) could call back into some new virtual method in BackendConsumer. This would leave the problem of inserting a new different (subclass of) BackendConsumer. But as a PluginASTAction cannot be a CodeGenAction, I would assume that at the very least the plugin action has to wrap all the virtual methods that CodeGenAction uses? This doesn't seem to be particularly viable, especially as some of the methods are protected. Duplicating CodeGenAction methods is also not a good option.

It would be great to enhance the existing clang plugin system rather than adding a new plugin extension point.

I generally agree, but... I spent a few days trying to come up with a PluginASTAction that achieves something similar... and gave up (=> this patch). (I've not much experience with Clang plugins, but the code is rather complicated and sparsely documented, which doesn't help, either.)

Do you mean that you will need to wrap clang's BackendConsumer into a MultiplexConsumer and overwrite its HandleTranslationUnit swapping BackendConsumer::HandleTranslationUnit's emitBackendOutput?

Ideally, I don't want to swap emitBackendOutput, because I don't want to duplicate that logic...

@vgvassilev
Copy link
Contributor

I think we can use the opportunity to improve that part of the plugin support but let's see what others have to say.

Independently, have you considered uploading the tpde-llvm to the llvm project itself? It does not seem a lot of code and it shows quite useful results for the JIT cases?

@aengelke
Copy link
Contributor Author

Independently, have you considered uploading the tpde-llvm to the llvm project itself?

Yes, and we concluded that we'd rather not pursue it at this point. Quoting from tpde2/tpde#6 (comment):

By the way, are there any plans to try to get tpde-llc upstreamed to LLVM

No. This is probably not blocked by technical issues, but due to organizational downsides. Upstreaming would either require us to split TPDE and TPDE-LLVM (requires vendoring TPDE in LLVM, causes too much development friction) or to include TPDE entirely in the monorepo (requires all TPDE users to pull in the LLVM monorepo, huge dependency, also causes development friction). As I said earlier, I consider TPDE to be primarily a research project and the LLVM monorepo is not a good space for research. We don't deeply depend on LLVM internals, so there're no significant technical benefits either.

@weliveindetail
Copy link
Member

@aengelke Thanks for taking the initiative! I got derailed from my backend swapping research after I settled with this downstream patch "for the moment". It was surprisingly small and effective, but no solid upstreamable implementation.

@weliveindetail was hoping for something that works cross-frontend

Yes, that's right. Adding a frontend-specific mechanism doesn't seem like the best approach to me. Populating and running the codegen pipeline is an inherent responsibility of LLVM. If we provided a generic way to make it configurable without breaking the interface, then all frontends would profit. So what options do we have?

Looking at different frontends, we see that all of them invoke TargetMachine::addPassesToEmitFile():

This seems to be a good place to add a hook and it's just two hops away from the entry-point proposed in this patch:

    llvm::CodeGenTargetMachineImpl::addPassesToEmitFile(...)
    EmitAssemblyHelper::AddEmitPasses(...)
->  EmitAssemblyHelper::RunCodegenPipeline(...)
    EmitAssemblyHelper::emitAssembly(...)
    clang::emitBackendOutput(...)

The notable difference is, that we only populate the codegen pipeline here and execute it later via PassManager::run(). This won't implicitly accommodate the hidden fallback mechanism in this patch.

Let's look at the code: I like the approach with the Registry class in this patch. It's simple, it's prepared for the shared library split and it's used in clang as well as LLVM's garbage collectors. And conceptually, it's a good fit at second glace:

  1. Alternative backends don't necessarily add functionality, they rather substitute it. LLVM's builtin backend is the default implementation and it might be overridden. That's not the nature of the Registry class, which is adding not overriding.
  2. But alternative code generators may also fail. This is the case for TPDE in particular, since it supports only a subset of LLVM. If it reaches an unsupported feature, it bails out gracefully and falls back to LLVM. This makes the Registry approach interesting again.

I think it's worth investigating, if we can keep the proposed approach, move the hook into CodeGenTargetMachineImpl and manage to accommodate the fallback case without breaking the interface. I will take some time to look into it and see where I get. What do you think?

On the side, we should note the second entry-point TargetMachine::addPassesToEmitMC() here, which is used by ORC and MCJIT. @lhames This code has a long legacy right? Is it time to revisit this distinction?

Independently, have you considered uploading the tpde-llvm to the llvm project itself?

Yes, and we concluded that we'd rather not pursue it at this point.

I agree and there are more people who want to load alternative backends and/or configure the codegen pipeline @ashermancinelli @jwillbold @vadorovsky Let's see if we can improve the plugin story instead! 🙏

@aengelke
Copy link
Contributor Author

Adding a frontend-specific mechanism doesn't seem like the best approach to me.

If we consider LLVM as a library that generates code and an alternative back-end as a different library that generates code, then it's up to the front-end to decide which library to use. No alternative back-end will be a fully compatible and transparent drop-in replacement for LLVM.

The front-end should be in full control of which back-end it uses and any "LLVM plugin" (which is not a thing right now) should merely be able to change defaults. For example, a JIT compiler we would want to enable/disable an alternative back-end in ORC JIT through a runtime switch for every compilation (e.g., baseline vs. later optimized code generation). (Well ok, that's not quite accurate in the TPDE-LLVM case -- here, users will likely want to avoid ORC JITLink entirely and use TPDE's faster JIT mapper instead.)

move the hook into CodeGenTargetMachineImpl and manage to accommodate the fallback case without breaking the interface.

It would almost certainly need a new API if fallback is to be supported (which, for TPDE-LLVM, would be a fairly important requirement).

While I think that our current addPassesToEmitX + PM.run API is not good, it's there and used.. a new API would need to be picked up by front-ends and it's not very likely that we'd deprecate the existing API?

@weliveindetail
Copy link
Member

The front-end should be in full control of which back-end it uses

Yes, as a pure backend selection feature in the frontend, this is fair. It doesn't fit my use-case where a pass plugin requires a different/modified codegen backend, but that might just be a different story. I think we should wait for feedback from clang maintainers then.

While I think that our current addPassesToEmitX + PM.run API is not good, it's there and used.. a new API would need to be picked up by front-ends and it's not very likely that we'd deprecate the existing API?

The switch to the new pass manager might actually be a good opportunity to revisit the interface. It follows the same approach to split populate and run, but it wouldn't need to be exposed that way. llc implements it here and calls into TargetMachine's new buildCodeGenPipeline() callback.

@aengelke
Copy link
Contributor Author

aengelke commented Nov 9, 2025

ping -- are there further opinions or any objections to this direction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants