Skip to content

Conversation

@ajrasane
Copy link
Contributor

What does this PR do?

Type of change:
Example update

Overview:

  • Optimize the benchmarking function in the diffusers example
python diffusion_trt.py --model flux-dev --benchmark --model-dtype BFloat16 --skip-image --torch

Testing

Backbone-only inference latency (BFloat16):
  Average: 139.48 ms
  P50: 139.36 ms
  P95: 141.13 ms
  P99: 141.35 ms

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

@ajrasane ajrasane requested a review from a team as a code owner October 31, 2025 01:18
@ajrasane ajrasane self-assigned this Oct 31, 2025
@ajrasane ajrasane requested a review from cjluo-nv October 31, 2025 01:18
@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.39%. Comparing base (ca94c96) to head (646458a).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #487      +/-   ##
==========================================
+ Coverage   74.36%   74.39%   +0.02%     
==========================================
  Files         181      182       +1     
  Lines       18192    18209      +17     
==========================================
+ Hits        13529    13546      +17     
  Misses       4663     4663              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97
Copy link
Collaborator

Please make sure to run internal gitlab diffuesrs cicd test to verify they dont break with this change

@ajrasane ajrasane force-pushed the ajrasane/benchmark_diffusers branch from 89f6c25 to 1aafbbc Compare November 7, 2025 19:28
Signed-off-by: ajrasane <[email protected]>
@ajrasane ajrasane force-pushed the ajrasane/benchmark_diffusers branch from 094aa94 to 646458a Compare November 7, 2025 20:05
def forward_hook(_module, _input, _output):
_ = backbone(**dummy_inputs_dict)
end_event.record()
torch.cuda.synchronize()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to call sync here.


avg_latency = sum(times) / len(times)
times = sorted(times)
p50 = times[len(times) // 2]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you use numpy.percentile for these instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants