Commit 77702b3
authored
Improve performance of
This PR adds a dedicated kernel for `dnp.nan_to_num` to improve its
performance. This reduces the number of kernel calls to at most one in
all cases.
A kernel for both strided and contiguous inputs have been added, to
avoid additional allocation of device memory for trivial strides when
input is fully C- or F-contiguous.
For example of performance gains, using Max GPU
master:
```python
In [1]: import dpnp as dnp
In [2]: import numpy as np
In [3]: x_np = np.random.randn(10**9)
In [4]: x_np[np.random.choice(x_np.size, 200, replace=False)] = np.nan
In [5]: x = dnp.asarray(x_np)
In [6]: q = x.sycl_queue
In [7]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 394 ms, sys: 43.8 ms, total: 438 ms
Wall time: 304 ms
In [8]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 333 ms, sys: 31.8 ms, total: 364 ms
Wall time: 134 ms
```
on branch:
```python
In [8]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 49.6 ms, sys: 8.1 ms, total: 57.7 ms
Wall time: 60.9 ms
In [9]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 22.9 ms, sys: 16 ms, total: 38.9 ms
Wall time: 19.7 ms
```dnp.nan_to_num (#2228)1 parent 5b140db commit 77702b3
File tree
7 files changed
+770
-19
lines changed- dpnp
- backend
- extensions/ufunc
- elementwise_functions
- kernels/elementwise_functions
- tests
7 files changed
+770
-19
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| 41 | + | |
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| 41 | + | |
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
| |||
64 | 65 | | |
65 | 66 | | |
66 | 67 | | |
| 68 | + | |
67 | 69 | | |
68 | 70 | | |
69 | 71 | | |
| |||
0 commit comments