You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Under specific conditions, algorithms in the C++ Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization uses single instruction, multiple data (SIMD) instructions provided by the CPU, a technique called vectorization. When this optimization isn't applied, the implementation is referred to as scalar.
10
+
Under specific conditions, algorithms in the MSVC Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization uses single instruction, multiple data (SIMD) instructions provided by the CPU, a technique called vectorization. When this optimization isn't applied, the implementation is referred to as scalar.
11
11
12
12
The conditions required for vectorization are:
13
13
- The container or range must be contiguous. Examples include `array`, `vector`, and `basic_string`. Types like `span` and `basic_string_view` provide contiguous ranges. Built-in arrays also form contiguous ranges. Containers like `list` and `map` aren't contiguous.
14
14
- The target platform must support the necessary SIMD instructions to implement the algorithm for the element types. This is typically true for arithmetic types and simple operations.
15
15
- One of these conditions must be met:
16
-
- The compiler can emit vectorized machine code for an implementation written as scalar code (auto-vectorization).
17
-
- The algorithm's implementation explicitly uses vectorized code (manual vectorization).
16
+
- The compiler can emit vectorized machine code for an implementation written as scalar code (auto-vectorization).
17
+
- The algorithm's implementation explicitly uses vectorized code (manual vectorization).
18
18
19
19
## Auto-vectorization in the STL
20
20
21
21
For more information about automatic vectorization, see [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion in that article about the [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch. This applies to the STL implementation code the same way it applies to user code.
22
22
23
-
Algorithms like `transform`, `reduce`, and `accumulate`heavily benefit from auto-vectorization.
23
+
Algorithms like `transform`, `reduce`, and `accumulate` benefit heavily from auto-vectorization.
24
24
25
25
## Manual vectorization in the STL
26
26
27
27
Certain algorithms for x64 and x86 include manual vectorization. This implementation is separately compiled and relies on runtime CPU dispatch, so it applies only to suitable CPUs.
28
28
29
29
Manually vectorized algorithms use template metaprogramming to detect if the element type is suitable for vectorization. As a result, they're only vectorized for simple types such as standard integer types.
30
30
31
-
Programs generally either benefit in performance from manual vectorization or remain unaffected by it. Disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS=0` in your project. Manually vectorized algorithms are enabled by default on x64 and x86 because it defaults to 1 on those platforms.
31
+
Programs either benefit in performance from manual vectorization or remain unaffected by it. Disable manual vectorization by defining `_USE_STD_VECTOR_ALGORITHMS=0` in your project. Manually vectorized algorithms are enabled by default on x64 and x86 because `_USE_STD_VECTOR_ALGORITHMS` defaults to 1 on those platforms.
32
32
33
33
Assign the same value to `_USE_STD_VECTOR_ALGORITHMS` for all linked translation units that use algorithms. Configure it in the project properties instead of in the source code for consistency. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md).
34
34
@@ -72,7 +72,7 @@ The STL addresses the first two considerations safely. Only `max_element`, `min_
72
72
73
73
Use `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating-point types. Set it to 0 to disable vectorization. `_USE_STD_VECTOR_FLOATING_ALGORITHMS` doesn't affect anything if `_USE_STD_VECTOR_ALGORITHMS` is set to 0.
74
74
75
-
`_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) is set.
75
+
The `_USE_STD_VECTOR_FLOATING_ALGORITHMS` macro defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) is set.
76
76
77
77
Assign the same value to `_USE_STD_VECTOR_FLOATING_ALGORITHMS` for all linked translation units that use algorithms. Configure it in the project properties instead of in the source code for consistency. For more information about how to configure it, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md).
0 commit comments