-
Notifications
You must be signed in to change notification settings - Fork 107
Jj/cumsum nvfuserex opinfo tolerance #2586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…race" This reverts commit 0354e9d. the promotion logic is also wrong. besides, having a type promotion to alter the behavior seems to be the wrong thing to do.
… original tests do run with un-promoted_math
for more information, see https://pre-commit.ci
…ance' into jj/cumsum_nvfuserex_opinfo_tolerance
Numerics looks pretty nasty for bf16/fp16. |
@naoyam test passed for me locally on your nvfuser branch. |
Linking the related PR NVIDIA/Fuser#5312 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you include the link to nvfuser pr in the comment?
Looks like Naoya added that right before your review. I added the link in the PR description. |
nvfuserex's new codegen support for cumsum runs math in reduced precision. as pytorch does.
It's failing opinfo test, since reference implementation uses double. Bumping the tolerance to keep CI happy.
Linking the related PR NVIDIA/Fuser#5312