Skip to content

Improve performance of random.randbelow by providing a C implementationΒ #126149

@eendebakpt

Description

@eendebakpt

Feature or enhancement

Proposal:

We can improve performance of random.randbelow (and methods depending on that functionality such as random.shuffle) by creating a C implementation of _random._randbelow_with_getrandbits. A quick prototype with a straightforward port of the Python code shows we can gain a factor 2 or better in performance for many cases.

Benchmark:

rng.getrandbits(8) 59.75599982775748 [ns]

rng._randbelow_with_getrandbits(4) 230.81 [ns]
rng._randbelow_with_getrandbits_c(4) 83.28 [ns]

rng._randbelow_with_getrandbits(10) 208.89 [ns]
rng._randbelow_with_getrandbits_c(10) 83.92 [ns]

rng._randbelow_with_getrandbits(123) 179.30 [ns]
rng._randbelow_with_getrandbits_c(123) 60.23 [ns]

rng._randbelow_with_getrandbits(129) 225.87 [ns]
rng._randbelow_with_getrandbits_c(129) 79.82 [ns]

rng._randbelow_with_getrandbits(314) 225.01 [ns]
rng._randbelow_with_getrandbits_c(314) 112.94 [ns]

rng._randbelow_with_getrandbits(10**20) 324.53 [ns]
rng._randbelow_with_getrandbits_c(10**20) 135.60 [ns]

rng._randbelow_with_getrandbits(10**250) 1233.00 [ns]
rng._randbelow_with_getrandbits_c(10**250) 1048.73 [ns]

shuffle(x) 442.07 [us]
shuffle_c(x) 204.40 [us]

Prototype: main...eendebakpt:randbelow_c

Benchmark script

import random
import _random
import timeit

rng = random.Random()

number=100_000
dt=timeit.timeit('rng.getrandbits(8)', globals=globals(), number=number)
print(f'rng.getrandbits(8) {1e9*dt/number} [ns]')
print()


for a in [4, 10, 123, 129, 314, 10**20, 10**250]:
    if a > 1e8:
        number=100
    else:
        number=100_000
        
    astr = a
    if a==10**250:
        astr ='10**250'
    if a==10**20:
        astr ='10**20'
    dt=timeit.timeit(f'rng._randbelow_with_getrandbits({astr})', globals=globals(), number=number)
    print(f'rng._randbelow_with_getrandbits({astr}) {1e9*dt/number:.2f} [ns]')
    
    if hasattr(rng, '_randbelow_with_getrandbits_c'):
        dt=timeit.timeit(f'rng._randbelow_with_getrandbits_c({astr})', globals=globals(), number=number)
        print(f'rng._randbelow_with_getrandbits_c({astr}) {1e9*dt/number:.2f} [ns]')
    
    print()


def shuffle(x):
    """Shuffle list x in place, and return None."""

    randbelow = rng._randbelow_with_getrandbits
    for i in reversed(range(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = randbelow(i + 1)
        x[i], x[j] = x[j], x[i]

number=1000
x = list(range(1600))
dt=timeit.timeit('shuffle(x)', globals=globals(), number=number)
print(f'shuffle(x) {1e6*dt/number:.2f} [us]')

if hasattr(rng, '_randbelow_with_getrandbits_c'):

    def shuffle_c(x):
        """Shuffle list x in place, and return None."""
    
        randbelow = rng._randbelow_with_getrandbits_c
        for i in reversed(range(1, len(x))):
            # pick an element in x[:i+1] with which to exchange x[i]
            j = randbelow(i + 1)
            x[i], x[j] = x[j], x[i]
    
    number=1000
    x = list(range(1600))
    dt=timeit.timeit('shuffle_c(x)', globals=globals(), number=number)
    print(f'shuffle_c(x) {1e6*dt/number:.2f} [us]')

@rhettinger In your opinion: is the speed-up from the C implementation worthwhile when compared to the additional complexity of the C implementation? If so I will cleanup the code and make a PR

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirperformancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions