Skip to content

Conversation

@solidpixel
Copy link
Contributor

@solidpixel solidpixel commented Jul 21, 2025

This PR adds a profiling layer which can collect API-correlated performance information, such as performance counters, from Arm GPUs.

Task list

  • Add CPU-GPU sync points for all workloads
  • Add pipeline barrier serialization for all workloads (see support layer)
  • Add queue submit serialization for all workloads (see support layer)
  • Add config handling to enable only for selected frames
  • Filter sync and serialization so it only applies to the selected ranges
  • Update developer documentation to reflect final design
  • Update user documentation to reflect final design

For future PRs:

  • Add config handling to enable based on times not frames.
  • Add custom counter selection to the config files.
  • Add timeline-like metadata export into the CSV files.
  • Make additional time wait conditional (although seems low value).
  • Expose debug_utils even if the driver doesn't (see Allow a layer to implement extensions the driver doesn't #134)

@solidpixel solidpixel marked this pull request as draft July 21, 2025 09:17
@solidpixel solidpixel force-pushed the profile branch 3 times, most recently from dea2448 to f948a4e Compare July 21, 2025 14:57
@solidpixel solidpixel force-pushed the profile branch 4 times, most recently from bd5bbc3 to d105c5c Compare July 24, 2025 09:02
@solidpixel solidpixel force-pushed the profile branch 2 times, most recently from 0e4b3f5 to 38045fd Compare July 24, 2025 09:25
@solidpixel solidpixel marked this pull request as ready for review July 24, 2025 12:38
@solidpixel solidpixel merged commit 02f5b26 into main Jul 24, 2025
6 checks passed
@solidpixel solidpixel deleted the profile branch July 24, 2025 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants