Skip to content

Significant slowdown of nanmean over slow axis dimensions #54

@JamesWrigley

Description

@JamesWrigley

I've run into a case on some large-ish arrays where nanmean is quite slow compared to mean:

using Statistics, NaNStatistics

data = rand(Float64, (2880, 2880, 700));
@time mean(data; dims=3)
@time nanmean(data; dims=3)

2.132474 seconds (9 allocations: 63.282 MiB)
85.518496 seconds (3 allocations: 63.281 MiB)

Whereas with numpy:

import numpy as np

# Note the flipped dimensions because of row-major vs column-major ordering
data = np.random.rand(700, 2880, 2880).astype(np.float64)
%time np.mean(data, axis=0)
%time np.nanmean(data, axis=0)

CPU times: user 2.12 s, sys: 9.91 ms, total: 2.13 s
Wall time: 2.14 s
CPU times: user 9.28 s, sys: 3.86 s, total: 13.1 s
Wall time: 13.2 s

I assume this is what #48 (comment) refers to. Do you reckon there's any way of recovering that performance?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions