|
1 | 1 | <cxx-clause id="parallel.alg">
|
2 | 2 | <h1>Parallel algorithms</h1>
|
3 | 3 |
|
4 |
| - <cxx-section id="parallel.alg.general"> |
5 |
| - <h1><del>In general</del></h1> |
6 |
| - |
7 |
| - <del> |
8 |
| - This clause describes components that C++ programs may use to perform operations on containers |
9 |
| - and other sequences in parallel. |
10 |
| - </del> |
11 |
| - |
12 |
| - <cxx-section id="parallel.alg.general.user"> |
13 |
| - <h1><del>Requirements on user-provided function objects</del></h1> |
14 |
| - |
15 |
| - <del> |
16 |
| - <p> |
17 |
| - Function objects passed into parallel algorithms as objects of type <code>BinaryPredicate</code>, |
18 |
| - <code>Compare</code>, and <code>BinaryOperation</code> shall not directly or indirectly modify |
19 |
| - objects via their arguments. |
20 |
| - </p> |
21 |
| - </del> |
22 |
| - </cxx-section> |
23 |
| - |
24 |
| - <cxx-section id="parallel.alg.general.exec"> |
25 |
| - <h1><del>Effect of execution policies on algorithm execution</del></h1> |
26 |
| - |
27 |
| - <del> |
28 |
| - <p> |
29 |
| - Parallel algorithms have template parameters named <code>ExecutionPolicy</code> which describe |
30 |
| - the manner in which the execution of these algorithms may be parallelized and the manner in |
31 |
| - which they apply the element access functions. |
32 |
| - </p> |
33 |
| - </del> |
34 |
| - |
35 |
| - <del> |
36 |
| - <p> |
37 |
| - The invocations of element access functions in parallel algorithms invoked with an execution |
38 |
| - policy object of type <code>sequential_execution_policy</code> execute in sequential order in |
39 |
| - the calling thread. |
40 |
| - </p> |
41 |
| - </del> |
42 |
| - |
43 |
| - <del> |
44 |
| - <p> |
45 |
| - The invocations of element access functions in parallel algorithms invoked with an execution |
46 |
| - policy object of type <code>parallel_execution_policy</code> are permitted to execute in an |
47 |
| - unordered fashion in either the invoking thread or in a thread implicitly created by the library |
48 |
| - to support parallel algorithm execution. Any such invocations executing in the same thread are |
49 |
| - indeterminately sequenced with respect to each other. |
50 |
| - |
51 |
| - <cxx-note> |
52 |
| - It is the caller's responsibility to ensure correctness, for example that the invocation does |
53 |
| - not introduce data races or deadlocks. |
54 |
| - </cxx-note> |
55 |
| - </p> |
56 |
| - </del> |
57 |
| - |
58 |
| - <del> |
59 |
| - <cxx-example><pre>using namespace std::experimental::parallel; |
60 |
| -int a[] = {0,1}; |
61 |
| -std::vector<int> v; |
62 |
| -for_each(par, std::begin(a), std::end(a), [&](int i) { |
63 |
| - v.push_back(i*2+1); |
64 |
| -}); |
65 |
| -</pre> |
66 |
| - |
67 |
| - The program above has a data race because of the unsynchronized access to the container |
68 |
| - <code>v</code>. |
69 |
| - </cxx-example></del><pre> |
70 |
| -</pre> |
71 |
| - |
72 |
| - <del> |
73 |
| - <cxx-example><pre> |
74 |
| -using namespace std::experimental::parallel; |
75 |
| -std::atomic<int> x = 0; |
76 |
| -int a[] = {1,2}; |
77 |
| -for_each(par, std::begin(a), std::end(a), [&](int n) { |
78 |
| - x.fetch_add(1, std::memory_order_relaxed); |
79 |
| - // spin wait for another iteration to change the value of x |
80 |
| - while (x.load(std::memory_order_relaxed) == 1) { } |
81 |
| -});</pre> |
82 |
| - |
83 |
| - The above example depends on the order of execution of the iterations, and is therefore |
84 |
| - undefined (may deadlock). |
85 |
| - </cxx-example></del><pre> |
86 |
| -</pre> |
87 |
| - |
88 |
| - <del> |
89 |
| - <cxx-example><pre> |
90 |
| -using namespace std::experimental::parallel; |
91 |
| -int x=0; |
92 |
| -std::mutex m; |
93 |
| -int a[] = {1,2}; |
94 |
| -for_each(par, std::begin(a), std::end(a), [&](int) { |
95 |
| - m.lock(); |
96 |
| - ++x; |
97 |
| - m.unlock(); |
98 |
| -});</del></pre> |
99 |
| - |
100 |
| - The above example synchronizes access to object <code>x</code> ensuring that it is |
101 |
| - incremented correctly. |
102 |
| - </cxx-example> |
103 |
| - |
104 |
| - <p> |
105 |
| - The invocations of element access functions in parallel algorithms invoked with an |
106 |
| - execution policy of type <code>unsequenced_policy</code> are permitted to execute |
107 |
| - in an unordered fashion in the calling thread, unsequenced with respect to one another |
108 |
| - within the calling thread. |
109 |
| - |
110 |
| - <cxx-note> |
111 |
| - This means that multiple function object invocations may be interleaved on a single thread. |
112 |
| - </cxx-note> |
113 |
| - <pre> |
114 |
| -</pre> |
115 |
| - |
116 |
| - <cxx-note> |
117 |
| - This overrides the usual guarantee from the C++ standard, Section 1.9 [intro.execution] that |
118 |
| - function executions do not interleave with one another. |
119 |
| - </cxx-note> |
120 |
| - </p> |
121 |
| - |
122 |
| - <p> |
123 |
| - The invocations of element access functions in parallel algorithms invoked with an |
124 |
| - executino policy of type <code>vector_policy</code> are permitted to execute |
125 |
| - in an unordered fashion in the calling thread, unsequenced with respect to one another |
126 |
| - within the calling thread, subject to the sequencing constraints of wavefront application |
127 |
| - (<cxx-ref to="parallel.alg.general.wavefront"></cxx-ref>) for the last argument to |
128 |
| - <code>for_loop</code> or <code>for_loop_strided</code>. |
129 |
| - </p> |
130 |
| - |
131 |
| - <p> |
132 |
| - The invocations of element access functions in parallel algorithms invoked with an execution |
133 |
| - policy of type <code>parallel_vector_execution_policy</code> |
134 |
| - are permitted to execute in an unordered fashion in unspecified threads, and unsequenced |
135 |
| - with respect to one another within each thread. |
136 |
| - <cxx-note> |
137 |
| - This means that multiple function object invocations may be interleaved on a single thread. |
138 |
| - </cxx-note> |
139 |
| - <pre> |
140 |
| -</pre> |
141 |
| - |
142 |
| - <cxx-note> |
143 |
| - This overrides the usual guarantee from the C++ standard, Section 1.9 [intro.execution] that |
144 |
| - function executions do not interleave with one another. |
145 |
| - </cxx-note> |
146 |
| - <pre> |
147 |
| -</pre> |
148 |
| - |
149 |
| - Since <code>parallel_vector_execution_policy</code> allows the execution of element access functions to be |
150 |
| - interleaved on a single thread, synchronization, including the use of mutexes, risks deadlock. Thus the |
151 |
| - synchronization with <code>parallel_vector_execution_policy</code> is restricted as follows:<pre> |
152 |
| -</pre> |
153 |
| - |
154 |
| - A standard library function is <em>vectorization-unsafe</em> if it is specified to synchronize with |
155 |
| - another function invocation, or another function invocation is specified to synchronize with it, and if |
156 |
| - it is not a memory allocation or deallocation function. Vectorization-unsafe standard library functions |
157 |
| - may not be invoked by user code called from <code>parallel_vector_execution_policy</code> algorithms.<pre> |
158 |
| -</pre> |
159 |
| - |
160 |
| - <cxx-note> |
161 |
| - Implementations must ensure that internal synchronization inside standard library routines does not |
162 |
| - induce deadlock. |
163 |
| - </cxx-note> |
164 |
| - </p> |
165 |
| - |
166 |
| - <cxx-example><pre> |
167 |
| -using namespace std::experimental::parallel; |
168 |
| -int x=0; |
169 |
| -std::mutex m; |
170 |
| -int a[] = {1,2}; |
171 |
| -for_each(par_vec, std::begin(a), std::end(a), [&](int) { |
172 |
| - m.lock(); |
173 |
| - ++x; |
174 |
| - m.unlock(); |
175 |
| -});</pre> |
176 |
| - |
177 |
| - The above program is invalid because the applications of the function object are not |
178 |
| - guaranteed to run on different threads. |
179 |
| - </cxx-example><pre> |
180 |
| -</pre> |
181 |
| - |
182 |
| - <cxx-note> |
183 |
| - The application of the function object may result in two consecutive calls to |
184 |
| - <code>m.lock</code> on the same thread, which may deadlock. |
185 |
| - </cxx-note><pre> |
186 |
| -</pre> |
187 |
| - |
188 |
| - <cxx-note> |
189 |
| - The semantics of the <code>parallel_execution_policy</code> or the |
190 |
| - <code>parallel_vector_execution_policy</code> invocation allow the implementation to fall back to |
191 |
| - sequential execution if the system cannot parallelize an algorithm invocation due to lack of |
192 |
| - resources. |
193 |
| - </cxx-note> |
194 |
| - |
195 |
| - <p> |
196 |
| - Algorithms invoked with an execution policy object of type <code>execution_policy</code> |
197 |
| - execute internally as if invoked with the contained execution policy object. |
198 |
| - </p> |
199 |
| - |
200 |
| - <p> |
201 |
| - The semantics of parallel algorithms invoked with an execution policy object of |
202 |
| - implementation-defined type are implementation-defined. |
203 |
| - </p> |
204 |
| - </cxx-section> |
205 |
| - |
206 | 4 | <cxx-section id="parallel.alg.general.wavefront">
|
207 | 5 | <h1>Wavefront Application</h1>
|
208 | 6 | <p>
|
|
0 commit comments