Skip to content

Commit 832ce4b

Browse files
committed
Apply P0075R2
1 parent 465f7fe commit 832ce4b

File tree

2 files changed

+368
-0
lines changed

2 files changed

+368
-0
lines changed

algorithms.html

Lines changed: 360 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -548,13 +548,373 @@ <h1>Header <code>&lt;experimental/algorithm&gt;</code> synopsis</h1>
548548
template&lt;class T&gt;
549549
ordered_update_t&lt;T&gt; ordered_update(T&amp; ref) noexcept;
550550
}
551+
<ins>
552+
553+
// Exposition only: Suppress template argument deduction.
554+
template&lt;class T&gt; struct no_deduce { using type = T; };
555+
template&lt;class T&gt; struct no_dedude_t = typename no_deduce&lt;T&gt;::type;
556+
557+
<cxx-ref insynopsis="" to="parallel.alg.reductions"></cxx-ref> Support for reductions
558+
template&lt;class T, class BinaryOperation&gt;
559+
<em>unspecified</em> reduction(T&amp; var, const T&amp; identity, BinaryOperation combiner);
560+
template&lt;class T&gt;
561+
<em>unspecified</em> reduction_plus(T&amp; var);
562+
template&lt;class T&gt;
563+
<em>unspecified</em> reduction_multiplies(T&amp; var);
564+
template&lt;class T&gt;
565+
<em>unspecified</em> reduction_bit_and(T&amp; var);
566+
template&lt;class T&gt;
567+
<em>unspecified</em> reduction_bit_or(T&amp; var);
568+
template&lt;class T&gt;
569+
<em>unspecified</em> reduction_bit_xor(T&amp; var);
570+
template&lt;class T&gt;
571+
<em>unspecified</em> reduction_min(T&amp; var);
572+
template&lt;class T&gt;
573+
<em>unspecified</em> reduction_max(T&amp; var);
574+
575+
<cxx-ref insynopsis="" to="parallel.alg.inductions"></cxx-ref> Support for inductions
576+
template&lt;class T&gt;
577+
<em>unspecified</em> induction(T&amp;&amp; var);
578+
template&lt;class T&gt;
579+
<em>unspecified</em> induction(T&amp;&amp; var, S stride);
580+
581+
<cxx-ref insynopsis="" to="parallel.alg.forloop"></cxx-ref> for_loop
582+
template&lt;class I, class... Rest&gt;
583+
void for_loop(no_deduce_t&lt;I&gt; start, I finish, Rest&amp;&amp;... rest);
584+
template&lt;class ExecutionPolicy,
585+
class I, class... Rest&gt;
586+
void for_loop(ExecutionPolicy&amp;&amp; exec,
587+
no_deduce_t&lt;I&gt; start, I finish, Rest&amp;&amp;... rest);
588+
template&lt;class I, class S, class... Rest&gt;
589+
void for_loop_strided(no_deduce_t&lt;I&gt; start, I finish,
590+
S stride, Rest&amp;&amp;... rest);
591+
template&lt;class ExecutionPolicy,
592+
class I, class S, class... Rest&gt;
593+
void for_loop_strided(ExecutionPolicy&amp;&amp; exec,
594+
no_deduce_t&lt;I&gt; start, I finish,
595+
S stride, Rest&amp;&amp;... rest);
596+
template&lt;class I, class Size, class... Rest&gt;
597+
void for_loop_n(I start, Size n, Rest&amp;&amp;... rest);
598+
template&lt;class ExecutionPolicy,
599+
class I, class Size, class... Rest&gt;
600+
void for_loop_n(ExecutionPolicy&amp;&amp; exec,
601+
I start, Size n, Rest&amp;&amp;... rest);
602+
template&lt;class I, class Size, class S, class... Rest&gt;
603+
void for_loop_n_strided(I start, Size n, S stride, Rest&amp;&amp;... rest);
604+
template&lt;class ExecutionPolicy,
605+
class I, class Size, class S, class... Rest&gt;
606+
void for_loop_n_strided(ExecutionPolicy&amp;&amp; exec,
607+
I start, Size n, S stride, Rest&amp;&amp;... rest);
608+
</ins>
551609
<del>}</del>
552610
}
553611
}
554612
<del>}</del>
555613
</pre>
556614
</cxx-section>
557615

616+
<cxx-section id="parallel.alg.reductions">
617+
<h1><ins>Reductions</ins></h1>
618+
619+
<ins>
620+
<p>
621+
Each of the function templates in this subclause ([parallel.alg.reductions]) returns a <em>reduction object</em>
622+
of unspecified type having a <em>reduction value type</em> and encapsulating a <em>reduction identity</em> value for the reduction, a
623+
<em>combiner</em> function object, and a <em>live-out object</em> from which the initial value is obtained and into which the final
624+
value is stored.
625+
</p>
626+
627+
<p>
628+
An algorithm uses reduction objects by allocating an unspecified number of instances, known as <em>accumulators</em>, of the reduction value
629+
type. <cxx-note>An implementation might, for example, allocate an accumulator for each thread in its private thread pool.</cxx-note>
630+
Each accumulator is initialized with the object’s reduction identity, except that the live-out object (which was initialized by the
631+
caller) comprises one of the accumulators. The algorithm passes a reference to an accumulator to each application of an element-access
632+
function, ensuring that no two concurrently executing invocations share the same accumulator. An accumulator can be shared between two
633+
applications that do not execute concurrently, but initialization is performed only once per accumulator.
634+
</p>
635+
636+
<p>
637+
Modifications to the accumulator by the application of element access functions accrue as partial results. At some point before the algorithm
638+
returns, the partial results are combined, two at a time, using the reduction object’s combiner operation until a single value remains, which
639+
is then assigned back to the live-out object. <cxx-note> in order to produce useful results, modifications to the accumulator should be limited
640+
to commutative operations closely related to the combiner operation. For example if the combiner is <code>plus&lt;T&gt;</code>, incrementing
641+
the accumulator would be consistent with the combiner but doubling it or assigning to it would not.</cxx-note>
642+
</p>
643+
</ins>
644+
645+
<cxx-function>
646+
<cxx-signature><ins>template&lt;class T, class BinaryOperation&gt;
647+
<em>unspecified</em> reduction(T&amp; var, const T&amp; identity, BinaryOperation combiner);</ins></cxx-signature>
648+
649+
<ins>
650+
<cxx-requires><ins>T shall meet the requirements of <code>CopyConstructible</code> and <code>MoveAssignable</code>. The expression <code>var = combiner(var, var)</code> shall be well-formed.</ins></cxx-requires>
651+
</ins>
652+
653+
<ins>
654+
<cxx-returns><ins>a reduction object of unspecified type having reduction value type <code>T</code>, reduction identity <code>identity</code>, combiner function object <code>combiner</code>, and using the object referenced by <code>var</code> as its live-out object.</ins></cxx-returns>
655+
</ins>
656+
</cxx-function>
657+
658+
<cxx-function>
659+
<cxx-signature><ins>template&lt;class T&gt;
660+
<em>unspecified</em> reduction_plus(T&amp; var);</ins></cxx-signature>
661+
<cxx-signature><ins>template&lt;class T&gt;
662+
<em>unspecified</em> reduction_multiplies(T&amp; var);</ins></cxx-signature>
663+
<cxx-signature><ins>template&lt;class T&gt;
664+
<em>unspecified</em> reduction_bit_and(T&amp; var);</ins></cxx-signature>
665+
<cxx-signature><ins>template&lt;class T&gt;
666+
<em>unspecified</em> reduction_bit_or(T&amp; var);</ins></cxx-signature>
667+
<cxx-signature><ins>template&lt;class T&gt;
668+
<em>unspecified</em> reduction_bit_xor(T&amp; var);</ins></cxx-signature>
669+
<cxx-signature><ins>template&lt;class T&gt;
670+
<em>unspecified</em> reduction_min(T&amp; var);</ins></cxx-signature>
671+
<cxx-signature><ins>template&lt;class T&gt;
672+
<em>unspecified</em> reduction_max(T&amp; var);</ins></cxx-signature>
673+
674+
<ins>
675+
<cxx-requires><ins>T shall meet the requirements of <code>CopyConstructible</code> and <code>MoveAssignable</code>.</ins></cxx-requires>
676+
</ins>
677+
678+
<ins>
679+
<cxx-returns><ins>a reduction object of unspecified type having reduction value type <code>T</code>, reduction identity and combiner operation as specified in table <cxx-ref to="reduction-identities-and-combiner-operations"></cxx-ref> and using the object referenced by <code>var</code> as its live-out object.</ins></cxx-returns>
680+
</ins>
681+
682+
<table is="cxx-table" class="column-rules" id=reduction-identities-and-combiner-operations>
683+
<caption><ins>Reduction identities and combiner operations</ins></caption>
684+
<thead>
685+
<tr>
686+
<th><ins>Function</ins></th>
687+
<th><ins>Reduction Identity</ins></th>
688+
<th><ins>Combiner Operation</ins></th>
689+
</tr>
690+
<tr>
691+
<th><ins><code>reduction_plus</code></ins></th>
692+
<th><ins><code>T()</code></ins></th>
693+
<th><ins><code>x + y</code></ins></th>
694+
</tr>
695+
<tr>
696+
<th><ins><code>reduction_multiplies</code></ins></th>
697+
<th><ins><code>T(1)</code></ins></th>
698+
<th><ins><code>x * y</code></ins></th>
699+
</tr>
700+
<tr>
701+
<th><ins><code>reduction_bit_and</code></ins></th>
702+
<th><ins><code>(~T())</code></ins></th>
703+
<th><ins><code>X &amp; y</code></ins></th>
704+
</tr>
705+
<tr>
706+
<th><ins><code>reduction_bit_or</code></ins></th>
707+
<th><ins><code>T()</code></ins></th>
708+
<th><ins><code>x | y</code></ins></th>
709+
</tr>
710+
<tr>
711+
<th><ins><code>reduction_bit_xor</code></ins></th>
712+
<th><ins><code>T()</code></ins></th>
713+
<th><ins><code>x ^ y</code></ins></th>
714+
</tr>
715+
<tr>
716+
<th><ins><code>reduction_min</code></ins></th>
717+
<th><ins><code>var</code></ins></th>
718+
<th><ins><code>min(x, y)</code></ins></th>
719+
</tr>
720+
<tr>
721+
<th><ins><code>reduction_max</code></ins></th>
722+
<th><ins><code>var</code></ins></th>
723+
<th><ins><code>max(x, y)</code></ins></th>
724+
</tr>
725+
</thead>
726+
</table>
727+
728+
<ins>
729+
<cxx-example><ins>The following code updates each element of <code>y</code> and sets <code>s</code> ot the sum of the squares.
730+
<pre>
731+
extern int n;
732+
extern float x[], y[], a;
733+
float s = 0;
734+
for_loop(execution::vec, 0, n,
735+
reduction(s, 0.0f, plus&lt;&gt;()),
736+
[&amp;](int i, float&amp; accum) {
737+
y[i] += a*x[i];
738+
accum += y[i]*y[i];
739+
}
740+
);
741+
</pre>
742+
</ins></cxx-example>
743+
</ins>
744+
</cxx-function>
745+
</cxx-section>
746+
747+
<cxx-section id="parallel.alg.inductions">
748+
<h1><ins>Inductions</ins></h1>
749+
750+
<ins>
751+
<p>
752+
Each of the function templates in this section return an <em>induction object</em> of unspecified type having an <em>induction
753+
value type</em> and encapsulating an initial value <em>i</em> of that type and, optionally, a <em>stride</em>.
754+
</p>
755+
756+
<p>
757+
For each element in the input range, an algorithm over input sequence <em>S</em> computes an <em>induction value</em> from an induction variable
758+
and ordinal position <em>p</em> within <em>S</em> by the formula <em>i + p * stride</em> if a stride was specified or <em>i + p</em> otherwise. This induction value is
759+
passed to the element access function.
760+
</p>
761+
762+
<p>
763+
An induction object may refer to a <em>live-out</em> object to hold the final value of the induction sequence. When the algorithm using the induction
764+
object completes, the live-out object is assigned the value <em>i + n * stride</em>, where <em>n</em> is the number of elements in the input range.
765+
</p>
766+
</ins>
767+
768+
<cxx-function>
769+
<cxx-signature><ins>template&lt;class T&gt;
770+
<em>unspecified</em> induction(T&amp;&amp; var);</ins></cxx-signature>
771+
772+
<cxx-signature><ins>template&lt;class T, class S&gt;
773+
<em>unspecified</em> induction(T&amp;&amp; var, S stride);</ins></cxx-signature>
774+
775+
<ins>
776+
<cxx-returns>
777+
<ins>
778+
an induction object with induction value type <code>remove_cv_t&gt;remove_reference_t&gt;T&lt;&lt;</code>,
779+
initial value <code>var</code>, and (if specified) stride <code>stride</code>. If <code>T</code> is an lvalue reference
780+
to non-<code>const</code> type, then the object referenced by <code>var</code> becomes the live-out object for the
781+
induction object; otherwise there is no live-out object.
782+
</ins>
783+
</cxx-returns>
784+
</ins>
785+
</cxx-function>
786+
</cxx-section>
787+
788+
<cxx-section id="parallel.alg.forloop">
789+
<h1><ins>For loop</ins></h1>
790+
791+
<cxx-function>
792+
<cxx-signature><ins>template&lt;class I, class... Rest&gt;
793+
void for_loop(no_deduce_t&lt;I&gt; start, I finish, Rest&amp;&amp;... rest);</ins></cxx-signature>
794+
795+
<cxx-signature><ins>template&lt;class ExecutionPolicy,
796+
class I, class... Rest&gt;
797+
void for_loop(ExecutionPolicy&amp;&amp; exec,
798+
no_deduce_t&lt;I&gt; start, I finish, Rest&amp;&amp;... rest);
799+
800+
</ins></cxx-signature>
801+
802+
<cxx-signature><ins>template&lt;class I, class S, class... Rest&gt;
803+
void for_loop_strided(no_deduce_t&lt;I&gt; start, I finish,
804+
S stride, Rest&amp;&amp;... rest);</ins></cxx-signature>
805+
806+
<cxx-signature><ins>template&lt;class ExecutionPolicy,
807+
class I, class S, class... Rest&gt;
808+
void for_loop_strided(ExecutionPolicy&amp;&amp; exec,
809+
no_deduce_t&lt;I&gt; start, I finish,
810+
S stride, Rest&amp;&amp;... rest);
811+
812+
</ins></cxx-signature>
813+
814+
<cxx-signature><ins>template&lt;class I, class Size, class... Rest&gt;
815+
void for_loop_n(I start, Size n, Rest&amp;&amp;... rest);</ins></cxx-signature>
816+
817+
<cxx-signature><ins>template&lt;class ExecutionPolicy,
818+
class I, class Size, class... Rest&gt;
819+
void for_loop_n(ExecutionPolicy&amp;&amp; exec,
820+
I start, Size n, Rest&amp;&amp;... rest);
821+
822+
</ins></cxx-signature>
823+
824+
<cxx-signature><ins>template&lt;class I, class Size, class S, class... Rest&gt;
825+
void for_loop_n_strided(I start, Size n, S stride, Rest&amp;&amp;... rest);</ins></cxx-signature>
826+
827+
<cxx-signature><ins>template&lt;class ExecutionPolicy,
828+
class I, class Size, class S, class... Rest&gt;
829+
void for_loop_n_strided(ExecutionPolicy&amp;&amp; exec,
830+
I start, Size n, S stride, Rest&amp;&amp;... rest);</ins></cxx-signature>
831+
832+
<ins>
833+
<cxx-requires>
834+
<ins>
835+
For the overloads with an <code>ExecutionPolicy</code>, <code>I</code> shall be an integral type
836+
or meet the requirements of a forward iterator type; otherwise, <code>I</code> shall be an integral
837+
type or meet the requirements of an input iterator type. <code>Size</code> shall be an integral type
838+
and <code>n</code> shall be non-negative. <code>S</code> shall have integral type and <code>stride</code>
839+
shall have non-zero value. <code>stride</code> shall be negative only if <code>I</code> has integral
840+
type or meets the requirements of a bidirectional iterator. The <code>rest</code> parameter pack shall
841+
have at least one element, comprising objects returned by invocations of <code>reduction</code>
842+
([parallel.alg.reduction]) and/or <code>induction</code> ([parallel.alg.induction]) function templates
843+
followed by exactly one invocable element-access function, <em>f</em>. For the overloads with an
844+
<code>ExecutionPolicy</code>, <em>f</em> shall meet the requirements of <code>CopyConstructible</code>;
845+
otherwise, <em>f</em> shall meet the requirements of <code>MoveConstructible</code>.
846+
</ins>
847+
</cxx-requires>
848+
</ins>
849+
850+
<ins>
851+
<cxx-effects>
852+
<ins>
853+
Applies <em>f</em> to each element in the <em>input sequence</em>, as described below, with additional
854+
arguments corresponding to the reductions and inductions in the <code>rest</code> parameter pack. The
855+
length of the input sequence is:
856+
857+
<ul>
858+
<li>
859+
<code>n</code>, if specified,
860+
</li>
861+
862+
<li>
863+
otherwise <code>finish - start</code> if neither <code>n</code> nor <code>stride</code> is specified,
864+
</li>
865+
866+
<li>
867+
otherwise <code>1 + (finish-start-1)/stride</code> if <code>stride</code> is positive,
868+
</li>
869+
870+
<li>
871+
otherwise <code>1 + (start-finish-1)/-stride</code>.
872+
</li>
873+
</ul>
874+
875+
The first element in the input sequence is <code>start</code>. Each subsequent element is generated by adding
876+
<code>stride</code> to the previous element, if <code>stride</code> is specified, otherwise by incrementing
877+
the previous element. <cxx-note>As described in the C++ standard, section [algorithms.general], arithmetic
878+
on non-random-access iterators is performed using advance and distance.</cxx-note> <cxx-note>The order of the
879+
elements of the input sequence is important for determining ordinal position of an application of <em>f</em>,
880+
even though the applications themselves may be unordered.</cxx-note></p>
881+
882+
The first argument to <em>f</em> is an element from the input sequence. <cxx-note>if <code>I</code> is an
883+
iterator type, the iterators in the input sequence are not dereferenced before
884+
being passed to <em>f</em>.</cxx-note> For each member of the rest parameter pack
885+
excluding <em>f</em>, an additional argument is passed to each application of <em>f</em> as follows:
886+
887+
<ul>
888+
<li>
889+
If the pack member is an object returned by a call to a reduction function listed in section
890+
[parallel.alg.reductions], then the additional argument is a reference to an accumulator of that reduction
891+
object.
892+
</li>
893+
894+
<li>
895+
If the pack member is an object returned by a call to <code>induction</code>, then the additional argument is the
896+
induction value for that induction object corresponding to the position of the application of <em>f</em> in the input
897+
sequence.
898+
</li>
899+
</ul>
900+
</ins>
901+
</cxx-effects>
902+
</ins>
903+
904+
<ins>
905+
<cxx-complexity>
906+
<ins>Applies <em>f</em> exactly once for each element of the input sequence.</ins>
907+
</cxx-complexity>
908+
</ins>
909+
910+
<ins>
911+
<cxx-remarks>
912+
<ins>If <em>f</em> returns a result, the result is ignored.</ins>
913+
</cxx-remarks>
914+
</ins>
915+
</cxx-function>
916+
</cxx-section>
917+
558918
<cxx-section id="parallel.alg.foreach">
559919
<h1><del>For each</del></h1>
560920

general.html

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,14 @@ <h1>Feature-testing recommendations</h1>
8080
<code>&lt;experimental/execution&gt;</code><br>
8181
</td>
8282
</tr>
83+
<tr>
84+
<td><ins>P0075R2</ins></td>
85+
<td><ins>Template Library for Parallel For Loops</ins></td>
86+
<td><cxx-ref to="parallel.alg.reductions"</cxx-ref>, <cxx-ref to="parallel.alg.inductions"</cxx-ref>, <cxx-ref to="parallel.alg.forloop"</cxx-ref></td>
87+
<td><code><ins>__cpp_lib_experimental_parallel_for_loop</ins></code></td>
88+
<td><ins>201711</ins></td>
89+
<td><code><ins>&lt;experimental/algorithm&gt;</ins></code></td>
90+
</tr>
8391
</thead>
8492
</table>
8593
</cxx-section>

0 commit comments

Comments
 (0)