Slicing: compare number of processors correctly#76
Open
alejandrogallo wants to merge 1 commit intocyclops-community:masterfrom
Open
Slicing: compare number of processors correctly#76alejandrogallo wants to merge 1 commit intocyclops-community:masterfrom
alejandrogallo wants to merge 1 commit intocyclops-community:masterfrom
Conversation
Collaborator
|
Thanks a lot Alejandro! These changes seem to make the python tests fail though, likely due to some subtleties involving sparsity, so I will need to investigate/adjust a bit before merging. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello,
we updated our version of
CTFand were having some issues with regardto the performance of the slicing in our code.
After some sniffing around we found this line in the slice method.
In version
1.4.1theifstatement had the<operator in it.Some time after that this was changed into
<=which according to myunderstanding renders the
elsecodeblock useless, i.e.This means that when the number of processors where
tsr_Bisdistributed among is equal to
tsr_A, there is also a checking of thedimensions and padding for
A, and this means thatCTFhas to readthe data from
A,... tsr_A->write(blk_sz_B, sr->mulid(), sr->addid(), blk_data_B, 'r'); ...which makes this block slower.
If this is true, this pull request would be a fix to the problem. We have
certainly tested it and it confirmed our suspicion. For slices of big tensors
the difference between the
<and the<=version is up to50%in time,according to our benchmarks. However, for small tensors it appears to be
roughly equivalent, which makes sense.
Thank you very much for your great project!