-
-
Notifications
You must be signed in to change notification settings - Fork 716
PERF: Use tuned ImageRangeRegion copy in CastImageFilter #5670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ | |
|
|
||
| #include "itkProgressReporter.h" | ||
| #include "itkImageAlgorithm.h" | ||
| #include "itkImageRegionRange.h" | ||
|
|
||
| namespace itk | ||
| { | ||
|
|
@@ -139,29 +140,30 @@ CastImageFilter<TInputImage, TOutputImage>::DynamicThreadedGenerateDataDispatche | |
|
|
||
| this->CallCopyOutputRegionToInputRegion(inputRegionForThread, outputRegionForThread); | ||
|
|
||
| const unsigned int componentsPerPixel = outputPtr->GetNumberOfComponentsPerPixel(); | ||
| ImageRegionRange<const TInputImage> inputRange(*inputPtr, inputRegionForThread); | ||
| ImageRegionRange<TOutputImage> outputRange(*outputPtr, outputRegionForThread); | ||
|
|
||
| // Define the iterators | ||
| ImageScanlineConstIterator inputIt(inputPtr, inputRegionForThread); | ||
| ImageScanlineIterator outputIt(outputPtr, outputRegionForThread); | ||
| auto inputIt = inputRange.begin(); | ||
| auto outputIt = outputRange.begin(); | ||
| const auto inputEnd = inputRange.end(); | ||
|
|
||
| while (!inputIt.IsAtEnd()) | ||
| // Note: This loop has been timed for performance with conversions between image of vectors and VectorImages and other | ||
| // combinations. The following was evaluated to be the best performance usage of iterators. Important considerations: | ||
| // - Usage of NumericTraits::GetLength() is sometimes consant vs virutal method GetNumberOfComponentsPerPixel() | ||
| // - The construction of inputPixel and outputPixel for VectorImages both reference the internal buffer and don't | ||
| // require memory allocations. | ||
| const unsigned int componentsPerPixel = itk::NumericTraits<OutputPixelType>::GetLength(*outputIt); | ||
| while (inputIt != inputEnd) | ||
| { | ||
| while (!inputIt.IsAtEndOfLine()) | ||
| const InputPixelType & inputPixel = *inputIt; | ||
|
Comment on lines
+156
to
+158
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can make it a beautiful range-based for loop, if you like 😃: for (const InputPixelType & inputPixel : inputRange)Then you may remove the variables
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How close are we to being able to do:
😈
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't try, but such a zip might be possible. Very cute 😸 However... Update: The input and output region do indeed have the same number of pixels, as implied by the reply from @dzenanz at https://github.com/InsightSoftwareConsortium/ITK/pull/5670/files/2e8c70a4499a2e87c8e0f65d25a355baa82f3bb9#r2586937371 So then, it is indeed unnecessary to check both inputRange.end() and outputRange.end(). So in this case
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using the range-based for loop in did not give the same performance is all cases in the test code. 🤷
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Interesting! Do you mean that my range-based
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By the way, the range-based So then the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It appears measurable slower in the performance testing code. I don't know why. |
||
| OutputPixelType outputPixel{ *outputIt }; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One alternative to this is to replace this by outside the while loop. Does it improve performance?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am probably missing some larger context behavior, but why do we need a temporary local variable for outputPixel? It seems that line 164 could be removed, and line 159 could be an alias to the current memory. OutputPixelType & outputPixel = *outputIt;
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I encourage both of you to compile and try experiments with PR #5669 @SimonRit Outside the loop it is a little slower ~2x for vector pixel output. It has odd statements to force a copy.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hjmjohnson FYI, for VectorImage, |
||
| for (unsigned int k = 0; k < componentsPerPixel; ++k) | ||
| { | ||
| const InputPixelType & inputPixel = inputIt.Get(); | ||
| OutputPixelType value{ outputIt.Get() }; | ||
| for (unsigned int k = 0; k < componentsPerPixel; ++k) | ||
| { | ||
| value[k] = static_cast<typename OutputPixelType::ValueType>(inputPixel[k]); | ||
| } | ||
| outputIt.Set(value); | ||
|
|
||
| ++inputIt; | ||
| ++outputIt; | ||
| outputPixel[k] = static_cast<typename OutputPixelType::ValueType>(inputPixel[k]); | ||
| } | ||
| inputIt.NextLine(); | ||
| outputIt.NextLine(); | ||
| *outputIt = outputPixel; | ||
dzenanz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ++inputIt; | ||
| ++outputIt; | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, do inputRegionForThread and outputRegionForThread always have exactly the same number of pixels (as returned by
ImageRegion::GetNumberOfPixels()), after callingCallCopyOutputRegionToInputRegion?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the filter has some "radius" (frequent), then the input region is slightly larger (I believe). Each filter decides how to increase the region during propagation from output to input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at an older version, it looks like it assumes that the output region has the same size as the input region. Because it just checks inputIt.IsAtEnd() and inputIt.IsAtEndOfLine(), not outputIt.IsAtEnd() or outputIt.IsAtEndOfLine():
ITK/Modules/Filtering/ImageFilterBase/include/itkCastImageFilter.hxx
Lines 145 to 165 in 921c5ba
That's OK to me, as long as that assumption is OK 😃 So then, in the new code, it's also OK to just check
inputRange.end()and ignoreoutputRange.end().There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cast image filter does not need a larger input region, so it does not override the default
CallCopyOutputRegionToInputRegionimplementation, and has the same input and output regions.