You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Under specific conditions, algorithms in the C++ Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization uses single instruction, multiple data (SIMD) instructions provided by the CPU, a technique known as vectorization. When this optimization isn't applied, the implementation is referred to as scalar.
10
+
Under specific conditions, algorithms in the C++ Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization uses single instruction, multiple data (SIMD) instructions provided by the CPU, a technique called vectorization. When this optimization isn't applied, the implementation is referred to as scalar.
11
11
12
12
The conditions required for vectorization are:
13
-
- The container or range must be contiguous. Examples of contiguous containers include `array`, `vector`, and `basic_string`. Contiguous ranges are provided by types like `span` and `basic_string_view`. Built-in arrays also form contiguous ranges. In contrast, containers like `list` and `map` aren't contiguous.
13
+
- The container or range must be contiguous. Examples include `array`, `vector`, and `basic_string`. Types like `span` and `basic_string_view` provide contiguous ranges. Built-in arrays also form contiguous ranges. Containers like `list` and `map` aren't contiguous.
14
14
- The target platform must support the necessary SIMD instructions to implement the algorithm for the element types. This is typically true for arithmetic types and simple operations.
15
-
- One of the following conditions must be met:
15
+
- One of these conditions must be met:
16
16
- The compiler can emit vectorized machine code for an implementation written as scalar code (auto-vectorization).
17
-
- The algorithm's implementation is explicitly written to use vectorized code (manual vectorization).
17
+
- The algorithm's implementation explicitly uses vectorized code (manual vectorization).
18
18
19
19
## Auto-vectorization in the STL
20
20
21
-
For more information about automatic vectorization, see [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion in that article about the [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch. This applies to the STL implementation code the same way it does to user code.
21
+
For more information about automatic vectorization, see [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion in that article about the [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch. This applies to the STL implementation code the same way it applies to user code.
22
22
23
-
Algorithms like `transform`, `reduce`, `accumulate` heavily benefit from auto-vectorization.
23
+
Algorithms like `transform`, `reduce`, and `accumulate` heavily benefit from auto-vectorization.
24
24
25
25
## Manual vectorization in the STL
26
26
27
-
For x64 and x86, certain algorithms include manual vectorization. This implementation is separately compiled and relies on runtime CPU dispatch, so it applies only to suitable CPUs.
27
+
Certain algorithms for x64 and x86 include manual vectorization. This implementation is separately compiled and relies on runtime CPU dispatch, so it applies only to suitable CPUs.
28
28
29
-
Manually vectorized algorithms use template meta-programming to detect whether the element type is suitable for vectorization. As a result, they're only vectorized for simple types such as standard integer types.
29
+
Manually vectorized algorithms use template metaprogramming to detect if the element type is suitable for vectorization. As a result, they're only vectorized for simple types such as standard integer types.
30
30
31
-
Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. You can disable manual vectorization by defining`_USE_STD_VECTOR_ALGORITHMS=0` in your project. Manually vectorized algorithms are enabled by default on x64 and x86 because it defaults to 1 on those platforms.
31
+
Programs generally either benefit in performance from manual vectorization or remain unaffected by it. Disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS=0` in your project. Manually vectorized algorithms are enabled by default on x64 and x86 because it defaults to 1 on those platforms.
32
32
33
-
Ensure that you assign the same value to `_USE_STD_VECTOR_ALGORITHMS` for all linked translation units that use algorithms. A reliable way to do this is by configuring it in the project properties instead of in the source code. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md).
33
+
Assign the same value to `_USE_STD_VECTOR_ALGORITHMS` for all linked translation units that use algorithms. Configure it in the project properties instead of in the source code for consistency. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md).
34
34
35
35
36
-
The `_USE_STD_VECTOR_ALGORITHMS` macro determines the behavior of the following manually vectorized algorithms:
36
+
The `_USE_STD_VECTOR_ALGORITHMS` macro controls the behavior of these manually vectorized algorithms:
The STL deals with the first two considerations safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms:
70
-
- Don’t compute new floating-point values. Instead, they only compare existing values to ensure that differences in operation order don't impact precision.
71
-
-Since these are sorting algorithms, `NaN` values aren't allowed inputs.
69
+
The STL addresses the first two considerations safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms:
70
+
- Don’t compute new floating-point values. Instead, they compare existing values to ensure that differences in operation order don't impact precision.
71
+
-Because these are sorting algorithms, `NaN` values aren't allowed as inputs.
72
72
73
-
Use `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floatingpoint types. Set it to 0 to disable vectorization. `_USE_STD_VECTOR_FLOATING_ALGORITHMS`has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0.
73
+
Use `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating-point types. Set it to 0 to disable vectorization. `_USE_STD_VECTOR_FLOATING_ALGORITHMS`doesn't affect anything if `_USE_STD_VECTOR_ALGORITHMS` is set to 0.
74
74
75
75
`_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) is set.
76
76
77
-
Ensure that you assign the same value to `_USE_STD_VECTOR_FLOATING_ALGORITHMS` for all linked translation units that use algorithms. A reliable way to do this is by configuring it in the project properties instead of in the source code. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md).
77
+
Assign the same value to `_USE_STD_VECTOR_FLOATING_ALGORITHMS` for all linked translation units that use algorithms. Configure it in the project properties instead of in the source code for consistency. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md).
0 commit comments