-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
WIth reference to the exampel in this notebook, this weekend I compared the performance of the the MERGESORT method vs. the NAIVE with a toy dataset of 8 columns x 21 rows:
%%timeit
dc = np.apply_along_axis(lambda col1: np.apply_along_axis(lambda col2: dcor.distance_correlation(col1,
col2, method = 'NAIVE'), axis = 0, arr=data), axis =0, arr=data)
>>> 24.3 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
vs:
%%timeit
dc = np.apply_along_axis(lambda col1: np.apply_along_axis(lambda col2: dcor.distance_correlation(col1,
col2, method = 'MERGESORT'), axis = 0, arr=data), axis =0, arr=data)
>>> 17.4 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Since i sometimes work with many thousands of rows, and possibly more columns, I wonder if there is a way to similarly improve the speed of the pairwise p-value calculation:
p = np.apply_along_axis(lambda col1: np.apply_along_axis(lambda col2: dcor.independence.distance_covariance_test(col1,
col2, exponent=1.0, num_resamples=2000)[0],
axis = 0, arr=data), axis =0, arr=data)
>>> 4.38 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Metadata
Metadata
Assignees
Labels
No labels