Skip to content

Slow permutations #95

@natemcintosh

Description

@natemcintosh

permutations() seems slower than it should be. Below is the example I discovered this with.

I have an array of strings, of length ~2,000. I would like to produce all permutations of lengths 1 to n (n in my application will always be less than or equal to 2).

My attempt is

import Combinatorics

"Return all permutations of sizes 1:`max_length`"
function power_permutations(iterable, max_length)    
    Iterators.flatten(Combinatorics.permutations(iterable, r) for r in 1:max_length)
end

and simply running through it we get

using BenchmarkTools

@benchmark foreach(identity, power_permutations(collect(1:2_000), 2))

BenchmarkTools.Trial: 
  memory estimate:  60.80 GiB
  allocs estimate:  20000015
  --------------
  minimum time:     17.592 s (36.62% GC)
  median time:      17.592 s (36.62% GC)
  mean time:        17.592 s (36.62% GC)
  maximum time:     17.592 s (36.62% GC)
  --------------
  samples:          1
  evals/sample:     1

Woah, that is a lot of memory.

Comparing to a similar version in Python.

import itertools

def power_permutations(iterable, max_length):
    "Return all permutations of sizes 1:`max_length`"
    for r in range(max_length + 1):
        yield from itertools.permutations(iterable, r)

%timeit len(list(power_permutations(list(range(2000)), 2)))                                                                                       
484 ms ± 8.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

A good deal faster than the Julia version.

With arrays of strings. In Julia:

@benchmark foreach(identity, power_permutations(collect(Iterators.repeated("hello",2_000)), 2))
BenchmarkTools.Trial: 
  memory estimate:  60.80 GiB
  allocs estimate:  20000015
  --------------
  minimum time:     17.404 s (36.79% GC)
  median time:      17.404 s (36.79% GC)
  mean time:        17.404 s (36.79% GC)
  maximum time:     17.404 s (36.79% GC)
  --------------
  samples:          1
  evals/sample:     1

And in Python:

%timeit len(list(power_permutations(["hello"]*2000, 2)))                                                                                          
473 ms ± 3.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

We see more or less the same performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions