performance variation when measuring the perf with "do_bench()" in triton.testing #807
Unanswered
stephen-youn
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi All,
I have a question on what would the reason that we have a variation in the measured performance when we run do_bench and measure the perf through torch.cuda.event (link).
and i wonder whether we need extra line of "torch.cuda.synchronize()" for every end_event record in the code (link)
the other euqestion is how the "5" comes as the initial number of runs to estimate n_warmpup and n_repeat. and what's the reason for formula behind this.
thanks
Beta Was this translation helpful? Give feedback.
All reactions