Skip to content

Conversation

jjsjann123
Copy link
Collaborator

nvfuserex's new codegen support for cumsum runs math in reduced precision. as pytorch does.
It's failing opinfo test, since reference implementation uses double. Bumping the tolerance to keep CI happy.

@jjsjann123 jjsjann123 marked this pull request as ready for review October 2, 2025 23:26
@jjsjann123
Copy link
Collaborator Author

Numerics looks pretty nasty for bf16/fp16.

@jjsjann123
Copy link
Collaborator Author

@naoyam test passed for me locally on your nvfuser branch.

@naoyam
Copy link
Collaborator

naoyam commented Oct 2, 2025

Linking the related PR NVIDIA/Fuser#5312

Copy link
Collaborator

@crcrpar crcrpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you include the link to nvfuser pr in the comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants