|
| 1 | +--- |
| 2 | +title: "Vectorized STL Algorithms" |
| 3 | +description: "Learn more about: Vectorized STL Algorithms" |
| 4 | +ms.date: 09/19/2025 |
| 5 | +f1_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_ALGORITHMS"] |
| 6 | +helpviewer_keywords: ["_USE_STD_VECTOR_ALGORITHMS", "_USE_STD_VECTOR_FLOATING_ALGORITHMS", "Vector Algorithms", "Vectorization", "SIMD"] |
| 7 | +--- |
| 8 | +# Vectorized STL Algorithms |
| 9 | + |
| 10 | +Under certain conditions, STL algorithms execute not element-wise, but multiple elements at once on a single CPU core. This is possible due to SIMD (single instruction, multiple data). The use of such an approach instead of element-wise approach is called vectorization. An implementation that is not vectorized is called scalar. |
| 11 | + |
| 12 | +The conditions for vectorization are: |
| 13 | + - The container or range is contiguous. `array`, `vector`, and `basic_string` are contiguous containers, `span` and `basic_string_view` provide contiguous ranges. Built-in array elements also form contiguous ranges. In contrast, `list` and `map` are not contiguous containers. |
| 14 | + - There are such SIMD instructions available for the target platform that implement the particular algorithm on particular element types efficiently. Often this is true for plain types (like built-in integers) and simple operations. |
| 15 | + - Either of the following: |
| 16 | + - The compiler is capable of emitting vectorized machine code for an implementation written as scalar code (auto-vectorization) |
| 17 | + - The implementation itself is written as vectorized code (manual vectorization) |
| 18 | + |
| 19 | +## Auto-vectorization in STL |
| 20 | + |
| 21 | +See [Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) and the discussion of [`/arch`](../build/reference/arch-minimum-cpu-architecture.md) switch there. It applies to the STL implementation code the same way as to user code. |
| 22 | + |
| 23 | +Algorithms like `transform`, `reduce`, `accumulate` heavily benefit from auto-vectorization. |
| 24 | + |
| 25 | +## Manual vectorization in STL |
| 26 | + |
| 27 | +For x64 and x86 targets, certain algorithms have manual vectorization implemented. This implementation is separately compiled, and uses runtime CPU dispatch, so it is engaged on suitable CPUs only. |
| 28 | + |
| 29 | +The manually vectorized algorithms use template meta-programming to detect the suitable element types, so they are only vectorized for simple types, like standard integer types. |
| 30 | + |
| 31 | +Generally, programs either benefit in performance from this manual vectorization or are unaffected by it. In case of any problem, you can disable manual vectorization by defining the `_USE_STD_VECTOR_ALGORITHMS` macro set to 0. It defaults to 1 on x64 and x86, which means that manually vectorized algorithms are enabled by default. |
| 32 | + |
| 33 | +When overriding `_USE_STD_VECTOR_ALGORITHMS` make sure to set the same value for all linked translation units that use algorithms. Reliable way to achieve that is using project properties rather than defining it in the source. See [/D (Preprocessor Definitions)](../build/reference/d-preprocessor-definitions.md) compiler option. |
| 34 | + |
| 35 | +The following algorithms have manual vectorization controlled via `_USE_STD_VECTOR_ALGORITHMS` macro: |
| 36 | + - `contains` |
| 37 | + - `contains_subrange` |
| 38 | + - `find` |
| 39 | + - `find_last` |
| 40 | + - `find_end` |
| 41 | + - `find_first_of` |
| 42 | + - `adjacent_find` |
| 43 | + - `count` |
| 44 | + - `mismatch` |
| 45 | + - `search` |
| 46 | + - `search_n` |
| 47 | + - `swap_ranges` |
| 48 | + - `replace` |
| 49 | + - `remove` |
| 50 | + - `remove_copy` |
| 51 | + - `unique` |
| 52 | + - `unique_copy` |
| 53 | + - `reverse` |
| 54 | + - `reverse_copy` |
| 55 | + - `rotate` |
| 56 | + - `is_sorted` |
| 57 | + - `is_sorted_until` |
| 58 | + - `max_element` |
| 59 | + - `min_element` |
| 60 | + - `minmax_element` |
| 61 | + - `max` |
| 62 | + - `min` |
| 63 | + - `minmax` |
| 64 | + - `lexicographical_compare` |
| 65 | + - `lexicographical_compare_three_way` |
| 66 | + |
| 67 | +In addition to algorithms, the macro controls the manual vectorization of: |
| 68 | + - `basic_string` and `basic_string_view` members: |
| 69 | + - `find` |
| 70 | + - `rfind` |
| 71 | + - `find_first_of` |
| 72 | + - `find_first_not_of` |
| 73 | + - `find_last_of` |
| 74 | + - `find_last_not_of` |
| 75 | + - `bitset` constructors from string and `bitset::to_string` |
| 76 | + |
| 77 | +## Manually vectorized algorithms for floating point types |
| 78 | + |
| 79 | +Vectorization of floating point types comes with extra difficulties: |
| 80 | + - Vectorization may reorder operations, which can affect the precision of floating point results. |
| 81 | + - Floating point types may contain `NaN` values, which don't behave transitively on comparisons. |
| 82 | + - Floating point operations may raise exceptions. |
| 83 | + |
| 84 | +The STL deals with the first two difficulties safely. Only `max_element`, `min_element`, `minmax_element`, `max`, `min`, `minmax`, `is_sorted`, and `is_sorted_until` are manually vectorized. These algorithms: |
| 85 | + - Do not compute new floating point values, only compare the existing values, so different order does not affect precision. |
| 86 | + - Because they are sorting algorithms, `NaNs` are not allowed amongst the operands. |
| 87 | + |
| 88 | +There's `_USE_STD_VECTOR_FLOATING_ALGORITHMS` to control the use of these vectorized algorithms for floating point types. Set it to 0 to disable the vectorization. The macro has no effect if `_USE_STD_VECTOR_ALGORITHMS` is set to 0. |
| 89 | + |
| 90 | +`_USE_STD_VECTOR_FLOATING_ALGORITHMS` defaults to 0 when [`/fp:except`](../build/reference/fp-specify-floating-point-behavior.md#except) option is set. This is to avoid problems with exceptions. |
| 91 | + |
| 92 | +When overriding `_USE_STD_VECTOR_FLOATING_ALGORITHMS` make sure to set the same value for all linked translation units that use algorithms. Reliable way to achieve that is using project properties rather than defining it in the source. |
| 93 | + |
| 94 | +## See also |
| 95 | + |
| 96 | +[Auto-Vectorizer](../parallel/auto-parallelization-and-auto-vectorization.md#auto-vectorizer) |
0 commit comments