Skip to content

Conversation

@pbchekin
Copy link
Contributor

@pbchekin pbchekin commented Apr 23, 2025

Adds 3 new command line options to scripts/test-triton.sh: --minicore, --mxfp, --scaled_dot.

The semantic of --core is not changed: it should execute the same tests as before.
Additionally, --minicore should be much faster than --core, and the remaining tests from core group can be executed separately with --mxfp --scaled_dot.

Required for #3976.

@pbchekin pbchekin merged commit 0cf724f into main Apr 24, 2025
4 of 5 checks passed
@pbchekin pbchekin deleted the split-core branch April 24, 2025 02:32
david-hls pushed a commit to david-hls/intel-xpu-backend-for-triton that referenced this pull request Jun 18, 2025
This caching seems to be responsible for some CUDA OOMs we encountered
in Meta-internal builds. I haven't got a reduced repro, but this change
does seem to fix things. My hypothesis is that the cached stream is
causing the memory allocated for the graph to be retained.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants