Skip to content

Commit f890420

Browse files
committed
edit pass
1 parent d6f6bd9 commit f890420

File tree

1 file changed

+37
-53
lines changed

1 file changed

+37
-53
lines changed

docs/standard-library/vectorized-stl-algorithms.md

Lines changed: 37 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -7,89 +7,73 @@ helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_AL
77
---
88
# Vectorized STL Algorithms
99

10-
Under certain conditions, STL algorithms execute not element-wise, but multiple elements at once on a single CPU core. This is possible due to SIMD (single instruction, multiple data). The use of such an approach instead of element-wise approach is called vectorization. An implementation that is not vectorized is called scalar.
10+
Under specific conditions, algorithms in the C++ Standard Template Library (STL) can process multiple elements simultaneously on a single CPU core, rather than handling each element individually. This optimization leverages single instruction, multiple data (SIMD) instructions provided by the CPU, a technique known as vectorization. When this optimization isn't applied, the implementation is referred to as scalar.
1111

12-
The conditions for vectorization are:
13-
- The container or range is contiguous. `array`, `vector`, and `basic_string` are contiguous containers, `span` and `basic_string_view` provide contiguous ranges. Built-in array elements also form contiguous ranges. In contrast, `list` and `map` are not contiguous containers.
14-
- There are such SIMD instructions available for the target platform that implement the particular algorithm on particular element types efficiently. Often this is true for plain types (like built-in integers) and simple operations.
15-
- Either of the following:
16-
- The compiler is capable of emitting vectorized machine code for an implementation written as scalar code (auto-vectorization)
17-
- The implementation itself is written as vectorized code (manual vectorization)
12+
The conditions required for vectorization are:
13+
- The container or range must be contiguous. Examples of contiguous containers include `array`, `vector`, and `basic_string`. Contiguous ranges are provided by types like `span` and `basic_string_view`. Built-in arrays also form contiguous ranges. In contrast, containers like `list` and `map` aren't contiguous.
14+
- The target platform must support the necessary SIMD instructions to implement the algorithm for the element types. This is typically true for intrinsic types (like built-in integers) and simple operations.
15+
- One of the following conditions must be met:
16+
- The compiler can emit vectorized machine code for an implementation written as scalar code (auto-vectorization).
17+
- The algorithm's implementation is explicitly written to use vectorized code (manual vectorization).
1818

19-
## Auto-vectorization in STL
19+
## Auto-vectorization in the STL
2020

21-
See [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion of [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch there. It applies to the STL implementation code the same way as to user code.
21+
For more information about automatic vectorization, see [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion about in that article about the [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch. This applies to the STL implementation code the same way as to user code.
2222

2323
Algorithms like `transform`, `reduce`, `accumulate` heavily benefit from auto-vectorization.
2424

25-
## Manual vectorization in STL
25+
## Manual vectorization in the STL
2626

27-
For x64 and x86 targets, certain algorithms have manual vectorization implemented. This implementation is separately compiled, and uses runtime CPU dispatch, so it is engaged on suitable CPUs only.
27+
For x64 and x86, certain algorithms include manual vectorization. This implementation is separately compiled, and uses runtime CPU dispatch, so it's only used on suitable CPUs.
2828

29-
The manually vectorized algorithms use template meta-programming to detect the suitable element types, so they are only vectorized for simple types, like standard integer types.
29+
Manually vectorized algorithms use template meta-programming to detect whether the element type is suitable for vectorization. As a result, they're only vectorized for simple types such as standard integer types.
3030

31-
Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. In case of any problem, you can disable manual vectorization by defining the `_USE_STD_VECTOR_ALGORITHMS` macro set to 0. It defaults to 1 on x64 and x86, which means that manually vectorized algorithms are enabled by default.
31+
Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. You can disable manual vectorization with `#define _USE_STD_VECTOR_ALGORITHMS=0'. Manually vectorized algorithms are enabled by default on x64 and x86 because it defaults to 1 on those platforms.
3232

33-
When overriding `_USE_STD_VECTOR_ALGORITHMS` make sure to set the same value for all linked translation units that use algorithms. Reliable way to achieve that is using project properties rather than defining it in the source. See [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md) compiler option.
33+
To set `_USE_STD_VECTOR_ALGORITHMS` ensure that it's set to the same value for all linked translation units that use algorithms. A reliable way to achieve this to set it using in the project properties instead of in source. For more information about how to do that, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md).
3434

35-
The following algorithms have manual vectorization controlled via `_USE_STD_VECTOR_ALGORITHMS` macro:
36-
- `contains`
37-
- `contains_subrange`
38-
- `find`
39-
- `find_last`
40-
- `find_end`
41-
- `find_first_of`
42-
- `adjacent_find`
35+
The following algorithms are manually vectorized and their behavior is controlled by the `_USE_STD_VECTOR_ALGORITHMS` macro:
36+
- `contains`, `contains_subrange`
37+
- `find`, `find_last`, `find_end`, `find_first_of`, `adjacent_find`
4338
- `count`
4439
- `mismatch`
45-
- `search`
46-
- `search_n`
40+
- `search`, `search_n`
4741
- `swap_ranges`
4842
- `replace`
49-
- `remove`
50-
- `remove_copy`
51-
- `unique`
52-
- `unique_copy`
53-
- `reverse`
54-
- `reverse_copy`
43+
- `remove`, `remove_copy`
44+
- `unique`, `unique_copy`
45+
- `reverse`, `reverse_copy`
5546
- `rotate`
56-
- `is_sorted`
57-
- `is_sorted_until`
58-
- `max_element`
59-
- `min_element`
60-
- `minmax_element`
61-
- `max`
62-
- `min`
63-
- `minmax`
64-
- `lexicographical_compare`
65-
- `lexicographical_compare_three_way`
66-
67-
In addition to algorithms, the macro controls the manual vectorization of:
47+
- `is_sorted`, `is_sorted_until`
48+
- `lexicographical_compare`, `lexicographical_compare_three_way`
49+
- `max`, `min`, `minmax`
50+
- `max_element`, `min_element`, `minmax_element`
51+
52+
In addition to algorithms, the `_USE_STD_VECTOR_ALGORITHMS` macro controls the manual vectorization of:
53+
6854
- `basic_string` and `basic_string_view` members:
6955
- `find`
7056
- `rfind`
71-
- `find_first_of`
72-
- `find_first_not_of`
73-
- `find_last_of`
74-
- `find_last_not_of`
57+
- `find_first_of`, `find_first_not_of`
58+
- `find_last_of`, `find_last_not_of`
7559
- `bitset` constructors from string and `bitset::to_string`
7660

7761
## Manually vectorized algorithms for floating point types
7862

79-
Vectorization of floating point types comes with extra difficulties:
63+
Vectorization of floating point types requires additional considerations:
8064
- Vectorization may reorder operations, which can affect the precision of floating point results.
8165
- Floating point types may contain `NaN` values, which don't behave transitively on comparisons.
8266
- Floating point operations may raise exceptions.
8367

84-
The STL deals with the first two difficulties safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms:
85-
- Do not compute new floating point values, only compare the existing values, so different order does not affect precision.
86-
- Because they are sorting algorithms, `NaNs` are not allowed amongst the operands.
68+
The STL deals with the first two considerations safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms:
69+
- Don't compute new floating point values, only compare the existing values, so different order does not affect precision.
70+
- Because they're sorting algorithms, `NaNs` isn't an allowed operand.
8771

88-
There's `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable the vectorization. The macro has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0.
72+
Use `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable vectorization. `_USE_STD_VECTOR_FLOATING_ALGORITHMS` has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0.
8973

90-
`_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) option is set. This is to avoid problems with exceptions.
74+
`_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) is set.
9175

92-
When overriding `_USE_STD_VECTOR_FLOATING_ALGORITHMS` make sure to set the same value for all linked translation units that use algorithms. Reliable way to achieve that is using project properties rather than defining it in the source.
76+
To set `_USE_STD_VECTOR_FLOATING_ALGORITHMS` ensure that it's set to the same value for all linked translation units that use algorithms. A reliable way to achieve this to set it using in the project properties instead of in source. For more information about how to do that, see [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md).
9377

9478
## See also
9579

0 commit comments

Comments
 (0)