Skip to content

Commit 75530bd

Browse files
authored
Merge pull request #85 from dzenanz/master
ENH: adding documentation for refactored multi-threading infrastructure
2 parents 245d888 + 0be8653 commit 75530bd

File tree

2 files changed

+111
-36
lines changed

2 files changed

+111
-36
lines changed

SoftwareGuide/Latex/Architecture/SystemOverview.tex

Lines changed: 88 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ \section{System Organization}
1818
concepts include generic programming, smart pointers for memory
1919
management, object factories for adaptable object instantiation,
2020
event management using the command/observer design paradigm, and
21-
multithreading support.
21+
multi-threading support.
2222

2323
\item[Numerics.] ITK uses VXL's VNL numerics libraries. These are
2424
easy-to-use C++ wrappers around the Netlib Fortran numerical
@@ -36,7 +36,7 @@ \section{System Organization}
3636
\emph{filters} that in turn may be organized into data flow
3737
\emph{pipelines}. These pipelines maintain state and therefore
3838
execute only when necessary. They also support
39-
multithreading, and are streaming capable (i.e., can operate
39+
multi-threading, and are streaming capable (i.e., can operate
4040
on pieces of data to minimize the memory footprint).
4141

4242
\item[IO Framework.] Associated with the data processing
@@ -375,30 +375,100 @@ \subsection{Event Handling}
375375
\subsection{Multi-Threading}
376376
\label{sec:MultiThreading}
377377

378-
Multithreading is handled in ITK through a high-level design
379-
abstraction. This approach provides portable multithreading and hides the
378+
Multi-threading is handled in ITK through a high-level design
379+
abstraction. This approach provides portable multi-threading and hides the
380380
complexity of differing thread implementations on the many systems supported
381-
by ITK. For example, the class \doxygen{MultiThreader} provides support for
382-
multithreaded execution using \code{sproc()} on an SGI, or
383-
\code{pthread\_create} on any platform supporting POSIX threads.
384-
385-
Multithreading is typically employed by an algorithm during its execution
386-
phase. MultiThreader can be used to execute a single method on
387-
multiple threads, or to specify a method per thread. For example, in the
381+
by ITK. For example, the class \doxygen{PlatformMultiThreader} provides support for
382+
multi-threaded execution by directly using platform-specific primitives such as
383+
\code{pthread\_create}. \doxygen{TBBMultiThreader} uses Intel's
384+
Thread Building Blocks cross-platform library,
385+
which can do dynamic workload balancing across multiple processes.
386+
This means that \code{outputRegionForThread} might have different sizes
387+
which change over time, depending on overall processor load.
388+
All multi-threader implementations derive from \doxygen{MultiThreaderBase}.
389+
390+
Multi-threading is typically employed by an algorithm during its execution
391+
phase. For example, in the
388392
class \doxygen{ImageSource} (a superclass for most image processing filters)
389393
the \code{GenerateData()} method uses the following methods:
390394

391395
\small
392396
\begin{minted}[baselinestretch=1,fontsize=\footnotesize,linenos=false,bgcolor=ltgray]{cpp}
393-
multiThreader->SetNumberOfThreads(int);
394-
multiThreader->SetSingleMethod(ThreadFunctionType, void* data);
395-
multiThreader->SingleMethodExecute();
397+
this->GetMultiThreader()->template ParallelizeImageRegion<OutputImageDimension>(
398+
this->GetOutput()->GetRequestedRegion(),
399+
[this](const OutputImageRegionType & outputRegionForThread)
400+
{ this->DynamicThreadedGenerateData(outputRegionForThread); }, this);
401+
\end{minted}
402+
\normalsize
403+
404+
In this example each thread invokes \code{DynamicThreadedGenerateData}
405+
method of the derived filter. The \code{ParallelizeImageRegion}
406+
method takes care to divide the image into different regions
407+
that do not overlap for write operations.
408+
\code{ImageSource}'s \code{GenerateData()} passes \code{this} pointer
409+
to \code{ParallelizeImageRegion}, which allows \code{ParallelizeImageRegion}
410+
to update the filter's progress after each region is finished processing.
411+
412+
If a filter has some serial part in the middle, in addition to initialization
413+
done in \code{BeforeThreadedGenerateData()} and finalization done in
414+
\code{AfterThreadedGenerateData()}, it can parallelize more than one method
415+
in its own version of \code{GenerateData()}, such as done by
416+
\doxygen{CannyEdgeDetectionImageFilter}:
417+
418+
\small
419+
\begin{minted}[baselinestretch=1,fontsize=\footnotesize,linenos=false,bgcolor=ltgray]{cpp}
420+
::GenerateData()
421+
{
422+
this->UpdateProgress(0.0f);
423+
Superclass::AllocateOutputs();
424+
// Small serial section
425+
this->UpdateProgress(0.01f);
426+
427+
// Calculate 2nd order directional derivative
428+
this->GetMultiThreader()->template ParallelizeImageRegion<TOutputImage::ImageDimension>(
429+
this->GetOutput()->GetRequestedRegion(),
430+
[this](const OutputImageRegionType & outputRegionForThread)
431+
{ this->ThreadedCompute2ndDerivative(outputRegionForThread); }, nullptr);
432+
this->UpdateProgress(0.45f);
433+
434+
// Calculate the gradient of the second derivative
435+
this->GetMultiThreader()->template ParallelizeImageRegion<TOutputImage::ImageDimension>(
436+
this->GetOutput()->GetRequestedRegion(),
437+
[this](const OutputImageRegionType & outputRegionForThread)
438+
{ this->ThreadedCompute2ndDerivativePos(outputRegionForThread); }, nullptr);
439+
this->UpdateProgress(0.9f);
440+
441+
// More processing
442+
this->UpdateProgress(1.0f);
443+
}
396444
\end{minted}
397445
\normalsize
398446

399-
In this example each thread invokes the same method. The multithreaded filter
400-
takes care to divide the image into different regions that do not overlap for
401-
write operations.
447+
When invoking \code{ParallelizeImageRegion} multiple times from
448+
\code{GenerateData()}, \code{nullptr} should be passed instead of \code{this},
449+
otherwise progress will go from 0\% to 100\% more than once. And this
450+
will at least confuse any other class watching the filter's progress events,
451+
even if it does not cause a crash. So the filter's author should estimate
452+
how long each part of \code{GenerateData()} takes, and update the progress
453+
``manually'' as in the example above.
454+
455+
With ITK version 5.0, the Multi-Threading mechanism has been refactored.
456+
What was previously \code{itk::MultiThreader}, is now a hierarchy of classes.
457+
\doxygen{PlatformMultiThreader} is a slightly cleaned-up version of the old
458+
class - \code{MultipleMethodExecute} and \code{SpawnThread} methods have been
459+
deprecated. But much of its content has been moved to
460+
\doxygen{MultiThreaderBase}. And classes should use the multi-threaders via
461+
\code{MultiThreaderBase} interface, to allow the end user the flexibility to
462+
select the multi-threader at run time. This also allows the filter to
463+
benefit from future improvements in threading such as addition of a new
464+
multi-threader implementation.
465+
466+
The backwards compatible \code{ThreadedGenerateData(Region, ThreadId)} method
467+
signature has been kept, for use in filters that must know their thread number.
468+
To use this signature, a filter must invoke
469+
\code{this->DynamicMultiThreadingOff();} before \code{Update();} is called by
470+
the filter's user or downstream filter in the pipeline. The best place for
471+
invoking \code{this->DynamicMultiThreadingOff();} is the filter's constructor.
402472

403473
The general philosophy in ITK regarding thread safety is that accessing
404474
different instances of a class (and its methods) is a thread-safe operation.
@@ -501,7 +571,7 @@ \section{Data Representation}
501571

502572
One of the important ITK concepts regarding images is that rectangular,
503573
continuous pieces of the image are known as \emph{regions}. Regions are used
504-
to specify which part of an image to process, for example in multithreading,
574+
to specify which part of an image to process, for example in multi-threading,
505575
or which part to hold in memory. In ITK there are three common types of
506576
regions:
507577
\begin{enumerate}

SoftwareGuide/Latex/DevelopmentGuidelines/WriteAFilter.tex

Lines changed: 23 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ \chapter{How To Write A Filter}
66
parts. An initial definition of terms is followed by an overview of
77
the filter creation process. Next, data streaming is discussed. The
88
way data is streamed in ITK must be understood in order to write
9-
correct filters. Finally, a section on multithreading describes what
9+
correct filters. Finally, a section on multi-threading describes what
1010
you must do in order to take advantage of shared memory parallel
1111
processing.
1212

@@ -160,15 +160,15 @@ \section{Overview of Filter Creation}
160160
parts of the implementation: defining the API, data members, and other
161161
implementation details of the algorithm. In particular, the filter writer
162162
will have to implement either a \code{GenerateData()} (non-threaded) or
163-
\code{ThreadedGenerateData()} method. (See Section~\ref{sec:MultiThreading}
164-
for an overview of multi-threading in ITK.)
163+
\code{ThreadedGenerateData()} and \code{DynamicThreadedGenerateData()} methods.
164+
(See Section~\ref{sec:MultiThreading} for an overview of multi-threading in ITK.)
165165

166166
An important note: the GenerateData() method is required to allocate memory
167167
for the output. The ThreadedGenerateData() method is not. In default
168168
implementation (see \doxygen{ImageSource}, a superclass of
169169
\doxygen{ImageToImageFilter})
170170
\code{GenerateData()} allocates memory and then invokes
171-
\code{ThreadedGenerateData()}.
171+
\code{DynamicThreadedGenerateData()} or \code{ThreadedGenerateData()}.
172172

173173
One of the most important decisions that the developer must make is whether
174174
the filter can stream data; that is, process just a portion of the input to
@@ -205,7 +205,7 @@ \section{Streaming Large Data}
205205
A significant benefit of this architecture is that the relatively complex
206206
process of managing pipeline execution is designed into the system. This
207207
means that keeping the pipeline up to date, executing only those portions of
208-
the pipeline that have changed, multithreading execution, managing memory
208+
the pipeline that have changed, multi-threading execution, managing memory
209209
allocation, and streaming is all built into the architecture. However, these
210210
features do introduce complexity into the system, the bulk of which is seen
211211
by class developers. The purpose of this chapter is to describe the pipeline
@@ -242,9 +242,9 @@ \subsection{Overview of Pipeline Execution}
242242
data processing kernels, that affect how much data input data
243243
(extra padding) is required.
244244

245-
\item It subdivides data into subpieces for multithreading. (Note
245+
\item It subdivides data into subpieces for multi-threading. (Note
246246
that the division of data into subpieces is exactly same problem as
247-
dividing data into pieces for streaming; hence multithreading comes
247+
dividing data into pieces for streaming; hence multi-threading comes
248248
for free as part of the streaming architecture.)
249249

250250
\item It may free (or release) output data if filters no longer need
@@ -444,7 +444,7 @@ \subsubsection{UpdateOutputData()}
444444

445445
The developer will never override \code{UpdateOutputData()}. The developer need
446446
only write the \code{GenerateData()} method (non-threaded) or
447-
\code{ThreadedGenerateData()} method. A discussion of threading follows in the
447+
\code{DynamicThreadedGenerateData()} method. A discussion on threading follows in the
448448
next section.
449449

450450

@@ -454,29 +454,34 @@ \section{Threaded Filter Execution}
454454

455455
Filters that can process data in pieces can typically multi-process
456456
using the data parallel, shared memory implementation built into the
457-
pipeline execution process. To create a multithreaded filter, simply
458-
define and implement a \code{ThreadedGenerateData()} method. For
459-
example, a \doxygen{ImageToImageFilter} would create the method:
457+
pipeline execution process. To create a multi-threaded filter, simply
458+
define and implement a \code{DynamicThreadedGenerateData()}.
459+
For example, a \doxygen{ImageToImageFilter} would create the method:
460460

461461
\small
462462
\begin{minted}[baselinestretch=1,fontsize=\footnotesize,linenos=false,bgcolor=ltgray]{cpp}
463-
void ThreadedGenerateData( const OutputImageRegionType&
464-
outputRegionForThread, ThreadIdType threadId ) override;
463+
void DynamicThreadedGenerateData( const OutputImageRegionType&
464+
outputRegionForThread ) override;
465465
\end{minted}
466466
\normalsize
467467

468-
The key to threading is to generate output for the output region given (as
469-
the first parameter in the argument list above). In ITK, this is simple to do
468+
The key to threading is to generate output for the output region given as
469+
the parameter. In ITK, this is simple to do
470470
because an output iterator can be created using the region provided. Hence
471471
the output can be iterated over, accessing the corresponding input pixels as
472472
necessary to compute the value of the output pixel.
473473

474474
Multi-threading requires caution when performing I/O (including using
475475
\code{cout} or \code{cerr}) or invoking events. A safe practice is to allow
476476
only thread id zero to perform I/O or generate events. (The thread id is
477-
passed as argument into \code{ThreadedGenerateData()}). If more than one
478-
thread tries to write to the same place at the same time, the program can
479-
behave badly, and possibly even deadlock or crash.
477+
passed as argument into \code{ThreadedGenerateData(Region, ThreadId)}).
478+
If more than one thread tries to write to the same place at the same time,
479+
the program can behave badly, and possibly even deadlock or crash.
480+
481+
Filters which need thread identifier (id) should implement the
482+
\code{ThreadedGenerateData(Region, ThreadId)} method and call
483+
\code{this->DynamicMultiThreadingOff();} in the filter's constructor.
484+
480485

481486

482487
\section{Filter Conventions}

0 commit comments

Comments
 (0)