After seeing #5735, @lemire pointed me to Batched Ranged Random Integer Generation published in Aug 2024 by Nevin Brackett-Rozinsky and himself, which sounds very promising for the STL:
Pseudorandom values are often generated as 64-bit binary words. These random words need to be converted into ranged values without statistical bias. We present an efficient algorithm to generate multiple independent uniformly-random bounded integers from a single uniformly-random binary word, without any bias. In the common case, our method uses one multiplication and no division operations per value produced. In practice, our algorithm can more than double the speed of unbiased random shuffling for small to moderately large arrays.
He expects that it would probably take just a few lines of code.