Skip to content

Conversation

@AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Feb 8, 2026

🗺️ Where it is

  • Bitmap algorithms for basic_string[_view]::find_{first|last}_[not_]of use bitmap of [0, 255] characters. if they fit.
  • The bitmap algorithms use O(n+m) steps instead of O(n*m) steps for direct implementation,
  • We runtime dispatch between two bitmap algorithms and direct algorithms based on inputs length.
  • AVX2 bitmap algorithm is the fastest for large haystack, but the needle processing is slow.
  • The AVX2 bitmap is stored in a __m256i variable. It takes few complex instructions per element to populate one bit.
  • For large enough needle it is better to form temporary bytes bitmap, and then use constant number of steps to compress it to bits. Yet for smaller needles the extra conversion is relatively slow, so it is not justified.

This optimization targets small bitmap populating.

🧠 Optimization

The new approach does not explicitly split the element value to low and high part. Instead it relies on the fact that _mm256_sllv_epi32/vpsllvd would zero the destination element for shifts greater or equal to destination element bit count. We broadcast the source to all bits of AVX2 vector, and xor high 3 bits with incrementing pattern, so exactly one element will become less than 32, and that will shift the corresponding one to the shift value.

The new approach has approximately the same cost in vector instructions, but it saves all scalar steps, which are about four logical/shift instructions.

Instead of using 32-bit elements, and splitting the source value to 3 high bits and 5 low bits, an alternative with 64-bits and 2 high / 6 low bits is possible. This alternative has exactly the same performance properties, but is a bit more squirrelly to get working on 32-bit x86, so there's no point in doing that. (In contrast, the old approach used 64-bit elements, and had 32-bit alternative, and that alternative was also hard to get to 32-bit x86).

✊ Force inline

There's codegen issue, where the compiler inserts non-VEX prefixed SSE on AVX2 path in function epilog. This turned the optimization to a major pessimization. The easiest way around was to eliminate that epilog, along with the function, by forced inline. Otherwise, the performance impact of forced inline seems also positive.

⚖️ Balance change

As one of algorithms became faster, where others stayed the same, we can adjust threshold to get the maximum of it. Though this is more tedious than the optimization itself, and without tuning them, things won't be worse, just possible missed opportunity.

⏱️ Benchmark results

Featured results:

Benchmark Before Time Speedup
bm<AlgType::str_member_first, char>/1011/11 99.2 ns 84.6 ns 1.17
bm<AlgType::str_member_first, wchar_t>/325/1 54.5 ns 37.8 ns 1.44
bm<AlgType::str_member_first, wchar_t>/1011/11 129 ns 108 ns 1.19
bm<AlgType::str_member_last, char>/1011/11 101 ns 85.5 ns 1.18
bm<AlgType::str_member_last, wchar_t>/325/1 40.1 ns 35.9 ns 1.12
bm<AlgType::str_member_first_not, char>/1011/11 103 ns 89.0 ns 1.16
bm<AlgType::str_member_first_not, wchar_t>/325/1 42.2 ns 37.8 ns 1.12
bm<AlgType::str_member_first_not, wchar_t>/1011/11 131 ns 108 ns 1.21
bm<AlgType::str_member_last_not, char>/1011/11 94.7 ns 85.9 ns 1.10
bm<AlgType::str_member_last_not, wchar_t>/325/1 40.2 ns 36.9 ns 1.09
bm<AlgType::str_member_last_not, wchar_t>/1011/11 128 ns 106 ns 1.21
All results
Benchmark Before Time Speedup
bm<AlgType::str_member_first, char>/2/3 5.43 ns 5.69 ns 0.95
bm<AlgType::str_member_first, char>/6/81 21.3 ns 20.6 ns 1.03
bm<AlgType::str_member_first, char>/7/4 13.9 ns 12.6 ns 1.10
bm<AlgType::str_member_first, char>/9/3 12.8 ns 11.8 ns 1.08
bm<AlgType::str_member_first, char>/22/5 13.3 ns 12.4 ns 1.07
bm<AlgType::str_member_first, char>/58/2 14.8 ns 13.9 ns 1.06
bm<AlgType::str_member_first, char>/75/85 37.9 ns 37.7 ns 1.01
bm<AlgType::str_member_first, char>/102/4 17.0 ns 15.5 ns 1.10
bm<AlgType::str_member_first, char>/200/46 36.9 ns 36.3 ns 1.02
bm<AlgType::str_member_first, char>/325/1 36.0 ns 33.3 ns 1.08
bm<AlgType::str_member_first, char>/400/50 51.5 ns 48.8 ns 1.06
bm<AlgType::str_member_first, char>/1011/11 99.2 ns 84.6 ns 1.17
bm<AlgType::str_member_first, char>/1280/46 120 ns 102 ns 1.18
bm<AlgType::str_member_first, char>/1502/23 130 ns 110 ns 1.18
bm<AlgType::str_member_first, char>/2203/54 186 ns 181 ns 1.03
bm<AlgType::str_member_first, char>/3056/7 227 ns 218 ns 1.04
bm<AlgType::str_member_first, wchar_t>/2/3 5.77 ns 6.05 ns 0.95
bm<AlgType::str_member_first, wchar_t>/6/81 42.1 ns 40.9 ns 1.03
bm<AlgType::str_member_first, wchar_t>/7/4 11.6 ns 11.0 ns 1.05
bm<AlgType::str_member_first, wchar_t>/9/3 14.3 ns 14.3 ns 1.00
bm<AlgType::str_member_first, wchar_t>/22/5 14.9 ns 14.7 ns 1.01
bm<AlgType::str_member_first, wchar_t>/58/2 19.4 ns 19.0 ns 1.02
bm<AlgType::str_member_first, wchar_t>/75/85 46.8 ns 46.3 ns 1.01
bm<AlgType::str_member_first, wchar_t>/102/4 21.2 ns 20.2 ns 1.05
bm<AlgType::str_member_first, wchar_t>/200/46 43.0 ns 43.1 ns 1.00
bm<AlgType::str_member_first, wchar_t>/325/1 54.5 ns 37.8 ns 1.44
bm<AlgType::str_member_first, wchar_t>/400/50 62.0 ns 60.9 ns 1.02
bm<AlgType::str_member_first, wchar_t>/1011/11 129 ns 108 ns 1.19
bm<AlgType::str_member_first, wchar_t>/1280/46 160 ns 144 ns 1.11
bm<AlgType::str_member_first, wchar_t>/1502/23 175 ns 159 ns 1.10
bm<AlgType::str_member_first, wchar_t>/2203/54 249 ns 251 ns 0.99
bm<AlgType::str_member_first, wchar_t>/3056/7 311 ns 318 ns 0.98
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/2/3 13.3 ns 13.4 ns 0.99
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/6/81 26.9 ns 24.4 ns 1.10
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/7/4 10.8 ns 10.8 ns 1.00
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/9/3 14.6 ns 14.4 ns 1.01
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/22/5 15.2 ns 15.0 ns 1.01
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/58/2 19.3 ns 19.0 ns 1.02
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/75/85 185 ns 170 ns 1.09
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/102/4 26.2 ns 26.2 ns 1.00
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/200/46 276 ns 244 ns 1.13
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/325/1 65.5 ns 65.9 ns 0.99
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/400/50 593 ns 548 ns 1.08
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/1011/11 466 ns 404 ns 1.15
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/1280/46 1626 ns 1416 ns 1.15
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/1502/23 993 ns 852 ns 1.17
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/2203/54 3161 ns 2915 ns 1.08
bm<AlgType::str_member_first, wchar_t, L'\x03B1'>/3056/7 562 ns 564 ns 1.00
bm<AlgType::str_member_first, char32_t>/2/3 5.67 ns 5.60 ns 1.01
bm<AlgType::str_member_first, char32_t>/6/81 24.2 ns 24.6 ns 0.98
bm<AlgType::str_member_first, char32_t>/7/4 11.1 ns 10.4 ns 1.07
bm<AlgType::str_member_first, char32_t>/9/3 11.8 ns 11.0 ns 1.07
bm<AlgType::str_member_first, char32_t>/22/5 14.1 ns 13.7 ns 1.03
bm<AlgType::str_member_first, char32_t>/58/2 15.8 ns 16.2 ns 0.98
bm<AlgType::str_member_first, char32_t>/75/85 51.4 ns 47.2 ns 1.09
bm<AlgType::str_member_first, char32_t>/102/4 19.9 ns 18.3 ns 1.09
bm<AlgType::str_member_first, char32_t>/200/46 43.7 ns 40.8 ns 1.07
bm<AlgType::str_member_first, char32_t>/325/1 51.3 ns 36.3 ns 1.41
bm<AlgType::str_member_first, char32_t>/400/50 59.7 ns 59.4 ns 1.01
bm<AlgType::str_member_first, char32_t>/1011/11 114 ns 109 ns 1.05
bm<AlgType::str_member_first, char32_t>/1280/46 144 ns 138 ns 1.04
bm<AlgType::str_member_first, char32_t>/1502/23 153 ns 152 ns 1.01
bm<AlgType::str_member_first, char32_t>/2203/54 239 ns 244 ns 0.98
bm<AlgType::str_member_first, char32_t>/3056/7 270 ns 270 ns 1.00
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/2/3 6.25 ns 6.45 ns 0.97
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/6/81 24.1 ns 24.2 ns 1.00
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/7/4 10.9 ns 10.4 ns 1.05
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/9/3 12.0 ns 10.5 ns 1.14
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/22/5 14.0 ns 13.3 ns 1.05
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/58/2 13.1 ns 12.5 ns 1.05
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/75/85 202 ns 205 ns 0.99
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/102/4 16.8 ns 16.5 ns 1.02
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/200/46 281 ns 282 ns 1.00
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/325/1 25.1 ns 24.4 ns 1.03
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/400/50 593 ns 590 ns 1.01
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/1011/11 319 ns 315 ns 1.01
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/1280/46 1728 ns 1737 ns 0.99
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/1502/23 997 ns 991 ns 1.01
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/2203/54 3460 ns 3441 ns 1.01
bm<AlgType::str_member_first, char32_t, U'\x03B1'>/3056/7 535 ns 533 ns 1.00
bm<AlgType::str_member_last, char>/2/3 5.19 ns 5.14 ns 1.01
bm<AlgType::str_member_last, char>/6/81 21.3 ns 20.6 ns 1.03
bm<AlgType::str_member_last, char>/7/4 16.4 ns 16.1 ns 1.02
bm<AlgType::str_member_last, char>/9/3 13.1 ns 12.5 ns 1.05
bm<AlgType::str_member_last, char>/22/5 13.4 ns 13.0 ns 1.03
bm<AlgType::str_member_last, char>/58/2 14.9 ns 14.3 ns 1.04
bm<AlgType::str_member_last, char>/75/85 39.1 ns 38.4 ns 1.02
bm<AlgType::str_member_last, char>/102/4 17.0 ns 16.2 ns 1.05
bm<AlgType::str_member_last, char>/200/46 35.0 ns 34.6 ns 1.01
bm<AlgType::str_member_last, char>/325/1 34.9 ns 34.1 ns 1.02
bm<AlgType::str_member_last, char>/400/50 50.7 ns 46.9 ns 1.08
bm<AlgType::str_member_last, char>/1011/11 101 ns 85.5 ns 1.18
bm<AlgType::str_member_last, char>/1280/46 124 ns 102 ns 1.22
bm<AlgType::str_member_last, char>/1502/23 136 ns 110 ns 1.24
bm<AlgType::str_member_last, char>/2203/54 186 ns 183 ns 1.02
bm<AlgType::str_member_last, char>/3056/7 226 ns 217 ns 1.04
bm<AlgType::str_member_last, wchar_t>/2/3 5.37 ns 5.43 ns 0.99
bm<AlgType::str_member_last, wchar_t>/6/81 40.0 ns 41.4 ns 0.97
bm<AlgType::str_member_last, wchar_t>/7/4 9.52 ns 9.30 ns 1.02
bm<AlgType::str_member_last, wchar_t>/9/3 13.2 ns 13.4 ns 0.99
bm<AlgType::str_member_last, wchar_t>/22/5 13.8 ns 13.9 ns 0.99
bm<AlgType::str_member_last, wchar_t>/58/2 17.8 ns 18.2 ns 0.98
bm<AlgType::str_member_last, wchar_t>/75/85 49.0 ns 46.2 ns 1.06
bm<AlgType::str_member_last, wchar_t>/102/4 22.2 ns 21.6 ns 1.03
bm<AlgType::str_member_last, wchar_t>/200/46 42.8 ns 41.2 ns 1.04
bm<AlgType::str_member_last, wchar_t>/325/1 40.1 ns 35.9 ns 1.12
bm<AlgType::str_member_last, wchar_t>/400/50 59.3 ns 59.0 ns 1.01
bm<AlgType::str_member_last, wchar_t>/1011/11 131 ns 107 ns 1.22
bm<AlgType::str_member_last, wchar_t>/1280/46 144 ns 143 ns 1.01
bm<AlgType::str_member_last, wchar_t>/1502/23 179 ns 157 ns 1.14
bm<AlgType::str_member_last, wchar_t>/2203/54 252 ns 252 ns 1.00
bm<AlgType::str_member_last, wchar_t>/3056/7 319 ns 322 ns 0.99
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/2/3 11.8 ns 12.7 ns 0.93
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/6/81 23.2 ns 22.9 ns 1.01
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/7/4 8.98 ns 9.04 ns 0.99
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/9/3 13.1 ns 13.7 ns 0.96
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/22/5 13.8 ns 14.0 ns 0.99
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/58/2 17.7 ns 18.0 ns 0.98
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/75/85 165 ns 164 ns 1.01
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/102/4 25.7 ns 25.4 ns 1.01
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/200/46 234 ns 234 ns 1.00
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/325/1 64.7 ns 64.8 ns 1.00
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/400/50 527 ns 516 ns 1.02
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/1011/11 405 ns 400 ns 1.01
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/1280/46 1401 ns 1416 ns 0.99
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/1502/23 878 ns 874 ns 1.00
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/2203/54 2827 ns 2803 ns 1.01
bm<AlgType::str_member_last, wchar_t, L'\x03B1'>/3056/7 555 ns 557 ns 1.00
bm<AlgType::str_member_first_not, char>/2/3 5.36 ns 5.41 ns 0.99
bm<AlgType::str_member_first_not, char>/6/81 22.4 ns 21.9 ns 1.02
bm<AlgType::str_member_first_not, char>/7/4 13.7 ns 12.7 ns 1.08
bm<AlgType::str_member_first_not, char>/9/3 12.9 ns 11.9 ns 1.08
bm<AlgType::str_member_first_not, char>/22/5 13.4 ns 12.3 ns 1.09
bm<AlgType::str_member_first_not, char>/58/2 14.6 ns 13.9 ns 1.05
bm<AlgType::str_member_first_not, char>/75/85 40.2 ns 36.8 ns 1.09
bm<AlgType::str_member_first_not, char>/102/4 17.1 ns 15.6 ns 1.10
bm<AlgType::str_member_first_not, char>/200/46 35.4 ns 34.4 ns 1.03
bm<AlgType::str_member_first_not, char>/325/1 37.0 ns 34.7 ns 1.07
bm<AlgType::str_member_first_not, char>/400/50 67.2 ns 47.8 ns 1.41
bm<AlgType::str_member_first_not, char>/1011/11 103 ns 89.0 ns 1.16
bm<AlgType::str_member_first_not, char>/1280/46 125 ns 123 ns 1.02
bm<AlgType::str_member_first_not, char>/1502/23 128 ns 117 ns 1.09
bm<AlgType::str_member_first_not, char>/2203/54 188 ns 190 ns 0.99
bm<AlgType::str_member_first_not, char>/3056/7 228 ns 226 ns 1.01
bm<AlgType::str_member_first_not, wchar_t>/2/3 5.55 ns 5.62 ns 0.99
bm<AlgType::str_member_first_not, wchar_t>/6/81 47.4 ns 45.6 ns 1.04
bm<AlgType::str_member_first_not, wchar_t>/7/4 11.3 ns 11.1 ns 1.02
bm<AlgType::str_member_first_not, wchar_t>/9/3 14.6 ns 14.3 ns 1.02
bm<AlgType::str_member_first_not, wchar_t>/22/5 15.3 ns 14.9 ns 1.03
bm<AlgType::str_member_first_not, wchar_t>/58/2 19.4 ns 19.1 ns 1.02
bm<AlgType::str_member_first_not, wchar_t>/75/85 49.5 ns 50.3 ns 0.98
bm<AlgType::str_member_first_not, wchar_t>/102/4 21.6 ns 19.0 ns 1.14
bm<AlgType::str_member_first_not, wchar_t>/200/46 43.5 ns 42.9 ns 1.01
bm<AlgType::str_member_first_not, wchar_t>/325/1 42.2 ns 37.8 ns 1.12
bm<AlgType::str_member_first_not, wchar_t>/400/50 66.3 ns 61.1 ns 1.09
bm<AlgType::str_member_first_not, wchar_t>/1011/11 131 ns 108 ns 1.21
bm<AlgType::str_member_first_not, wchar_t>/1280/46 166 ns 145 ns 1.14
bm<AlgType::str_member_first_not, wchar_t>/1502/23 181 ns 161 ns 1.12
bm<AlgType::str_member_first_not, wchar_t>/2203/54 254 ns 249 ns 1.02
bm<AlgType::str_member_first_not, wchar_t>/3056/7 328 ns 326 ns 1.01
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/2/3 13.6 ns 13.8 ns 0.99
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/6/81 27.8 ns 27.2 ns 1.02
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/7/4 11.3 ns 10.8 ns 1.05
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/9/3 14.7 ns 14.2 ns 1.04
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/22/5 15.2 ns 15.1 ns 1.01
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/58/2 19.4 ns 19.3 ns 1.01
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/75/85 192 ns 179 ns 1.07
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/102/4 26.6 ns 26.4 ns 1.01
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/200/46 275 ns 253 ns 1.09
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/325/1 65.8 ns 66.1 ns 1.00
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/400/50 612 ns 581 ns 1.05
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/1011/11 431 ns 418 ns 1.03
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/1280/46 1621 ns 1527 ns 1.06
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/1502/23 846 ns 858 ns 0.99
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/2203/54 3236 ns 3053 ns 1.06
bm<AlgType::str_member_first_not, wchar_t, L'\x03B1'>/3056/7 553 ns 557 ns 0.99
bm<AlgType::str_member_last_not, char>/2/3 4.89 ns 5.03 ns 0.97
bm<AlgType::str_member_last_not, char>/6/81 21.4 ns 21.4 ns 1.00
bm<AlgType::str_member_last_not, char>/7/4 13.8 ns 12.7 ns 1.09
bm<AlgType::str_member_last_not, char>/9/3 12.7 ns 12.0 ns 1.06
bm<AlgType::str_member_last_not, char>/22/5 13.2 ns 12.7 ns 1.04
bm<AlgType::str_member_last_not, char>/58/2 14.6 ns 14.1 ns 1.04
bm<AlgType::str_member_last_not, char>/75/85 38.1 ns 38.8 ns 0.98
bm<AlgType::str_member_last_not, char>/102/4 16.5 ns 15.6 ns 1.06
bm<AlgType::str_member_last_not, char>/200/46 34.0 ns 33.8 ns 1.01
bm<AlgType::str_member_last_not, char>/325/1 36.0 ns 35.6 ns 1.01
bm<AlgType::str_member_last_not, char>/400/50 53.0 ns 47.1 ns 1.13
bm<AlgType::str_member_last_not, char>/1011/11 94.7 ns 85.9 ns 1.10
bm<AlgType::str_member_last_not, char>/1280/46 118 ns 107 ns 1.10
bm<AlgType::str_member_last_not, char>/1502/23 129 ns 117 ns 1.10
bm<AlgType::str_member_last_not, char>/2203/54 206 ns 190 ns 1.08
bm<AlgType::str_member_last_not, char>/3056/7 248 ns 230 ns 1.08
bm<AlgType::str_member_last_not, wchar_t>/2/3 5.36 ns 5.15 ns 1.04
bm<AlgType::str_member_last_not, wchar_t>/6/81 40.2 ns 39.5 ns 1.02
bm<AlgType::str_member_last_not, wchar_t>/7/4 9.36 ns 9.50 ns 0.99
bm<AlgType::str_member_last_not, wchar_t>/9/3 13.1 ns 13.0 ns 1.01
bm<AlgType::str_member_last_not, wchar_t>/22/5 13.9 ns 13.8 ns 1.01
bm<AlgType::str_member_last_not, wchar_t>/58/2 18.2 ns 18.2 ns 1.00
bm<AlgType::str_member_last_not, wchar_t>/75/85 45.6 ns 47.0 ns 0.97
bm<AlgType::str_member_last_not, wchar_t>/102/4 20.9 ns 18.9 ns 1.11
bm<AlgType::str_member_last_not, wchar_t>/200/46 42.0 ns 41.7 ns 1.01
bm<AlgType::str_member_last_not, wchar_t>/325/1 40.2 ns 36.9 ns 1.09
bm<AlgType::str_member_last_not, wchar_t>/400/50 66.4 ns 58.5 ns 1.14
bm<AlgType::str_member_last_not, wchar_t>/1011/11 128 ns 106 ns 1.21
bm<AlgType::str_member_last_not, wchar_t>/1280/46 143 ns 142 ns 1.01
bm<AlgType::str_member_last_not, wchar_t>/1502/23 176 ns 161 ns 1.09
bm<AlgType::str_member_last_not, wchar_t>/2203/54 255 ns 251 ns 1.02
bm<AlgType::str_member_last_not, wchar_t>/3056/7 320 ns 335 ns 0.96
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/2/3 11.1 ns 11.4 ns 0.97
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/6/81 28.0 ns 28.4 ns 0.99
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/7/4 11.4 ns 11.8 ns 0.97
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/9/3 13.0 ns 13.1 ns 0.99
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/22/5 13.6 ns 13.8 ns 0.99
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/58/2 18.4 ns 18.6 ns 0.99
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/75/85 211 ns 207 ns 1.02
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/102/4 25.9 ns 25.8 ns 1.00
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/200/46 309 ns 306 ns 1.01
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/325/1 64.5 ns 65.6 ns 0.98
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/400/50 690 ns 699 ns 0.99
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/1011/11 621 ns 644 ns 0.96
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/1280/46 1908 ns 1941 ns 0.98
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/1502/23 1318 ns 1331 ns 0.99
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/2203/54 4161 ns 3752 ns 1.11
bm<AlgType::str_member_last_not, wchar_t, L'\x03B1'>/3056/7 572 ns 569 ns 1.01

🥇 Results interpretation

  • The 1011/11 cases are target cases, glad seeing them improved.
  • Some of 1280/46 or 1502/23 are improved, but not all of them consistently, I attribute that to __forceinline effect.
  • wchar_t/325/1 should have been forwarded into usual find, but here we are, and optimized bitmap improved them,

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner February 8, 2026 12:25
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Feb 8, 2026
@StephanTLavavej StephanTLavavej added the performance Must go faster label Feb 8, 2026
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Feb 8, 2026
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Feb 9, 2026
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo. Please notify me if any further changes are pushed, otherwise no action is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

Status: Merging

Development

Successfully merging this pull request may close these issues.

2 participants