-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Replace RenderGraph with systems
#22144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
It looks like your PR is a breaking change, but you didn't provide a migration guide. Please review the instructions for writing migration guides, then expand or revise the content in the migration guides directory to reflect your changes. |
|
It looks like your PR has been selected for a highlight in the next release blog post, but you didn't provide a release note. Please review the instructions for writing release notes, then expand or revise the content in the release notes directory to showcase your changes. |
| ) | ||
| .entered(); | ||
|
|
||
| world.run_schedule(schedule); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the "big" reasons I've pushed back against RenderGraph -> Systems in the past is that it prevents us from parallelizing command encoding in cases like this where we need to run specific, potentially very expensive "subgraphs" in a loop.
The ownership and access patterns in the current system allow for that sort of thing (and were explicitly chosen to enable it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things that I'd like to note:
- With the exception of one caveat I'll note in the PR description, this isn't worse than the existing solution in terms of overall parallelism.
- In the context of cameras, we effectively currently need to be serial because we re-use
ViewTargettextures. Without a higher level resource tracking system, we have to pessimistically assume that cameras need to run in terms of the order they are given due to implicit dependencies. - Something like a "read only" schedule marker has been proposed before to allow running schedules on
&World. ECS devs have suggested to me this is feasible. - We do do mutation in the current graph in many places, but have to resort to hack-y atomics usage. It's not crazy that you want to mutate world sometimes (i.e. to insert some gpu resource to be used by the next frame).
- I'd like to solve issues around parallelism in the context of more generalized ECS patterns. I have a number of more speculative ideas here, but I think there are a few different directions we could go.
But yeah, we obviously need to make sure we don't regress current state performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to keep in mind maintenance. This PR manages to remove a lot of code and makes the render graph much more approachable for people that already know bevy's ECS. It's 3.8k less lines of code to maintain. Of course, line of code isn't a super useful metric for many things but I think it's significant enough that it can't just be ignored. With the current render graph, we almost never see PRs to it and I wouldn't be surprised if a big reason for that is that it looks so foreign in bevy and because it's also a lot of very complex code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the points made above about approachability, this also means dogfooding systems for a very performance critical part of Bevy. Working towards meeting the demands of that via systems and working on optimizations will then turn into benefits for everything else
| .add_systems(Render, prepare_cas_pipelines.in_set(RenderSystems::Prepare)) | ||
| .add_systems( | ||
| Core3d, | ||
| cas.after(fxaa) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewer note: fxaa is ordered after tonemapping, so the .after tonemapping here is redundant thus removed
| Core3d, | ||
| temporal_anti_alias | ||
| .after(Core3dSystems::StartMainPassPostProcessing) | ||
| .before(Core3dSystems::PostProcessing), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some ordering constraints were lost here
atlv24
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to check Hide whitespace in diff view options!
Some notes:
- should we be using fallible systems to more closely match the Result behavior of the original nodes?
- some ordering constraints were removed but i think many were redundant. i noted these
- tracing infra probably is redundant with system traces now (followup)
- can probably use run conditions in a bunch of places too now (followup)
- render_context -> ctx could have been a different pr (and is a matter of taste but i agree with it anyways) its just noise here
- one of the solari files is jumbled and unreviewable
This is an absolutely colossal amount of work with incredible attention to detail
fantastic job, excited to see this merge. Lots of small fixes bundled here and its pretty much all good. There's some comments that were removed but i dont feel very strongly about them.
Very in favor of this merging, approval will come after comments addressed and pr exits draft status.
I'm going to dig into the CI failures soon
crates/bevy_core_pipeline/src/core_3d/main_transmissive_pass_3d_node.rs
Outdated
Show resolved
Hide resolved
crates/bevy_core_pipeline/src/experimental/mip_generation/mod.rs
Outdated
Show resolved
Hide resolved
atlv24
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yayyyyyy
solari and meshlet examples both work, checked a few other examples and all looks good
| bevy_ecs = { path = "../bevy_ecs", version = "0.19.0-dev" } | ||
| bevy_core_pipeline = { path = "../bevy_core_pipeline", version = "0.19.0-dev" } | ||
| bevy_diagnostic = { path = "../bevy_diagnostic", version = "0.19.0-dev" } | ||
| bevy_post_process = { path = "../bevy_post_process", version = "0.19.0-dev" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes clean compile a bit worse cus linearizes compilation of these two crates, but its probably fine. could use system sets to avoid it though
| // Early prepass build indirect parameters | ||
| early_prepass_build_indirect_parameters | ||
| .after(early_gpu_preprocess) | ||
| .before(early_prepass), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should use them whenever they help compilation parallelism i guess
andriyDev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, though I'm no rendering guru, just a regular ECS enjoyer :)
| StartMainPass, | ||
| EndMainPass, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need two of this here? Maybe this should just be MainPass and then users should just say .before(MainPass) or .after(MainPass)?
If we never say in_set for these, then I think these are the wrong system set names - most of our rendering systems IMO should be inside these sets, not scheduled around them.
Same reasoning for Core2dSystems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think about this more. In the old render graph, they were empty place-holder nodes used only for relative ordering. But that kind of usage might be weird with system sets. I think you're right that we may just want to have a MainPass, although that does make us lose the ability to order relative to the start/end. I've wondered before whether it would be possible to have a system config that's like "as late as possible within set" which would function similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I'd be in favour of a MainPass system set here. I'm pretty sure the only reason it's not like that right now is because the render graph didn't support that kind of api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've made that change! We now just have Prepass, MainPass, and PostProcess.
|
|
||
| #[derive(Component)] | ||
| struct BloomTexture { | ||
| pub struct BloomTexture { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not excited about how many of these things we're making public. We might consider adding SystemSets for these systems so we don't have to make these internal types public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably, many more things in the renderer should be public. It can often be useful to access those things as users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think there's a valid conversation to have here about whether we want to use system sets preferentially over systems but I do think making things pub is nbd in general and we should typically favor it unless there's really good reason for encapsulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always hated working with the render graph API and this makes so much more sense to me. The UI rendering changes all seem fine.
I ran some naive benchmarks, and everything is great, except for one major regression with the many_buttons example with many-cameras enabled:
cargo run --example many_buttons --release -- --many-cameras --buttons 25
It's a very silly unrealistic benchmark though, so might not be an issue.
Let me at least take a peak to see if I can identify where the overhead is coming from. Thanks for checking these. |
| ), | ||
| ), | ||
| ); | ||
| #[cfg(all(feature = "dlss", not(feature = "force_disable_dlss")))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there two separate functions? Use a type alias for the ViewQuery contents behind a cfg instead, like how Solari works on mainline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay yeah, I have cleaned this up as well as fixed a number of other divergences from main.
|
@ickshonpe Okay, that was a real change in behavior that was mostly unintentional. I was accidentally submitting after every camera rather than at the end of the graph. With that fixed in e29562b, it's now only showing a ~10% regression on that benchmark. |
Still seems like there might be some other problem. Even with
(Red=main)
|

render-graph-as-systems
Note
Remember to check hide whitespace in diff view options when reviewing this PR
This PR removes the
RenderGraphin favor of using systems.Motivation
The
RenderGraphAPI was originally created when the ECS was significantly more immature. It was also created with the intention of supporting an input/output based slot system for managing resources that has never been used. While resource management is an important potential use of a render graph, current rendering code doesn't make use of any patterns relating to it.Since the ECS has improved, the functionality of
Schedulehas basically become co-extensive with what theRenderGraphAPI is doing, i.e. ordering bits of system-like logic relative to one another and executing them in a big chunk. Additionally, while there's still desire for more advanced techniques like resource management in the graph, it's desirable to implement those in ECS terms rather than creating moreRenderGraphspecific abstraction.In short, this sets us up to iterate on a more ECS based approach, while deleting ~3k lines of mostly unused code.
Implementation
At a high level: We use
Scheduleas our "sub-graph." Rather than running the graph, we run a schedule. Systems can be ordered relative to one another.The render system uses a
RenderGraphschedule to define the "root" of the graph.core_pipelineadds acamera_driversystem that runs the per-camera schedules. This top level schedule provides an extension point for apps that may want to do custom rendering, or non-camera rendering.CurrentView/ViewQueryWhen running schedules per-camera in the
camera_driversystem, we insert aCurrentViewresource that's used to mark the currently iterating view. We also add a new paramViewQuerythat internally uses this resource to execute the query and skip the system if it doesn't match as a convenience.RenderContextThe
RenderContextis now a system param that wraps aDeferredfor tracking the state of the current command encoder and queued buffers.SystemBufferWe use an system buffer impl to track command encoders in the render context and rely on apply deferred in order to encode them all. Currently, this encodes them in series. There are likely opportunities here to make this more efficient.
Benchmarks
Bistro
Caldera
Future steps
There are a number of exciting potential changes that could follow here:
TODO: