Skip to content

Quite different performance of pairwise on CPU vs GPU #143

@xukai92

Description

@xukai92

I found the pairwise with SqEuclidean is faster than my own implementation on CPU but slower on GPU. Any idea why and possible optimization on Distances.jl side?

MWE:

using Test, BenchmarkTools, Distances, CuArrays

function pairwise_dot_kai(x)
    d, n = size(x)
    xixj = x' * x
    xsq = sum(x .^ 2; dims=1)
    return repeat(xsq, n, 1) + repeat(xsq', 1, n) - 2xixj
end

pairwise_dot(x) = pairwise(SqEuclidean(), x; dims=2)

xbench = randn(Float32, 784, 200);

@benchmark pairwise_dot_kai(xbench)
BenchmarkTools.Trial: 
  memory estimate:  1.52 MiB
  allocs estimate:  17
  --------------
  minimum time:     854.227 μs (0.00% GC)
  median time:      1.183 ms (0.00% GC)
  mean time:        1.361 ms (12.37% GC)
  maximum time:     125.259 ms (98.46% GC)
  --------------
  samples:          3662
  evals/sample:     1
@benchmark pairwise_dot(xbench)
BenchmarkTools.Trial: 
  memory estimate:  166.59 KiB
  allocs estimate:  204
  --------------
  minimum time:     359.751 μs (0.00% GC)
  median time:      406.615 μs (0.00% GC)
  mean time:        458.925 μs (6.46% GC)
  maximum time:     104.066 ms (99.27% GC)
  --------------
  samples:          10000
  evals/sample:     1
xbench = xbench |> cu;
@benchmark pairwise_dot_kai(xbench)
BenchmarkTools.Trial: 
  memory estimate:  1.20 MiB
  allocs estimate:  19424
  --------------
  minimum time:     19.042 ms (0.00% GC)
  median time:      20.028 ms (0.00% GC)
  mean time:        21.811 ms (3.62% GC)
  maximum time:     52.425 ms (38.23% GC)
  --------------
  samples:          230
  evals/sample:     1
@benchmark pairwise_dot(xbench)
BenchmarkTools.Trial: 
  memory estimate:  10.99 MiB
  allocs estimate:  240635
  --------------
  minimum time:     453.229 ms (0.00% GC)
  median time:      470.074 ms (0.00% GC)
  mean time:        474.353 ms (2.67% GC)
  maximum time:     499.969 ms (6.04% GC)
  --------------
  samples:          11
  evals/sample:     1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions