Skip to content

Conversation

@SwapnilGhanshyala
Copy link

@SwapnilGhanshyala SwapnilGhanshyala commented Dec 27, 2023

When maximal fusion is requested, slice computation can skip checking if the sibling loop is parallel or not, since maximal does not care about performance improvement. Modified slice computation to skip said check if the maximal flag is set. Modified FusionStrategy class to have a boolean member representing maximal fusion and modified slice computation methods to pass the information about the selected strategy.

Issue: The Issue we are trying to solve is the unnecessary nesting of loops as a result of maximal fusion llvm#61820
Original Pipeline: If the sibling was sequential, then its bounds were set to be constants (the original bounds). This was forcing a nesting of the loops instead of fusion. In standard fusion (non-maximal), the strategy would have been rejected based on performance heuristics, and another iteration of bounds calculation would start. However, in maximal-fusion, this nested strategy would not be rejected.
Solution: When in maximal fusion, skip the check on the sibling to see if it is parallel or not. This prevents the bounds from being set as constants.
Reasoning: The fusion of parallel to sequential loops will produce sequential loops. It seems to be a performance-based decision to not let this happen. Hence maximal fusion can safely ignore this check.

@SwapnilGhanshyala
Copy link
Author

Hi @bondhugula, a gentle reminder for a review.

Copy link

@bondhugula bondhugula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API update isn't proper.

Comment on lines 1750 to 1747

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else if

Comment on lines 406 to 407

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't proper to do. computeSliceUnion shouldn't have anything to do with the fusion strategy or anything fusion related. You'll have to rethink what option to pass if the behavior has to change here.

Comment on lines 1745 to 1747

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this is an analysis utility and should not refer to any specific fusion-related things.

Comment on lines 1746 to 1748

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For eg., one can add an option skipParallelismCheck.

Lower-level utilities shouldn't refer to or encode any semantics from higher-level/client utilities.

@SwapnilGhanshyala
Copy link
Author

I see, this lower-level utility might be used by other mechanisms than loop fusion, and should be agnostic of that. So rather than passing the fusionStrategy, just passing a flag/option to skip the parallelism check is more appropriate.

Will make the changes.

Copy link
Author

@SwapnilGhanshyala SwapnilGhanshyala Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified signatures of getComputeSliceState() and computeSliceUnion() to also receive a bool skipParallismCheck, which is a flag to skip parallelism check on sibling loop if set.

Copy link
Author

@SwapnilGhanshyala SwapnilGhanshyala Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified FusionStrategy class to have member bool maximalFusion which represents if fusion-maximal is set or not. Also, added a method that returns it's set value.

Copy link
Author

@SwapnilGhanshyala SwapnilGhanshyala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bondhugula, pushed the changes addressing previous comments.
Importantly, addressed the comment to make the slice computation agnostic of where it is called from.

Copy link

@bondhugula bondhugula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test case!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heuristic

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All inputs should appear before. sliceUnion is an output arg.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maximal fusion

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add a code comment here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit test.

…1820)

When maximal fusion is requested, slice computation can skip checking if
the sibling loop is parallel or not, since maximal does not care about
performance improvement. Modified slice computation to skip said check
if skipParallelismCheck flag is set. Modified FusionStrategy class to have a boolean
member representing maximal fusion and modified slice computation
methods to pass the information about the selected strategy.

Added test case in loop-fusion-4.mlir
@SwapnilGhanshyala
Copy link
Author

Hi @bondhugula, addressed the comments and added a unit test to loop-fusion-4.mlir.

Copy link
Author

@SwapnilGhanshyala SwapnilGhanshyala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bondhugula, a gentle reminder for review.

@SwapnilGhanshyala
Copy link
Author

Hi @bondhugula , gentle reminder for review.

Copy link

@bondhugula bondhugula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the right fix or approach. The incorrect fusion has nothing specific to do with sequential loops or parallel loops nor to do with maximal fusion (it could also happen with non-maximal fusion where the compute tolerance allows sufficient redundant computation), nor is it specific to reduction nests. The invalid fusion here is due to cyclic dependences in the source nest. This is fixed comprehensively by llvm#128397
It also required a fix to a check on the compute tolerance.

@bondhugula bondhugula closed this Feb 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants