Skip to content

Conversation

@accable
Copy link

@accable accable commented Jul 2, 2025

This PR fixes a segmentation fault that occurs during tensor memory allocation for large extents.

The issue stems from using the global extent array when computing elementsA, elementsB, and elementsC. While this works under the original toy example {6, 6, 6, 4, 4, 4}, it breaks when dimensions and mode mappings differ, as seen in configurations such as {1, 8, 512, 8, 512}.

The fix replaces usage of extent[...] with the corresponding per-tensor extents (extentA, extentB, and extentC), which are already calculated properly based on mode permutations. This ensures the memory allocation aligns with how the tensor descriptors are defined.

Tested using NVIDIA Grace with this repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant