|
| 1 | +# Benchmark between xarray and groupby of CommonDataModel |
| 2 | + |
| 3 | + |
| 4 | +Test case is 3D array of size 360x180x10959 (one degree resolution global dataset representing 30 years of daily data). |
| 5 | +Normally distributed random data (mean = 100, variance = 1) in single precision floats (`Float32`). |
| 6 | +We compute the mean and standard deviation (std) of data grouped by month. |
| 7 | + |
| 8 | +Accuracy is assessed by comparison with built-in functions (Statistics.jl or numpy) in double precision (using `Float64`). |
| 9 | +Note that julia’s `mean`/`std` give exactly the same results as numpy’s equivalent. |
| 10 | + |
| 11 | +Using 1 CPU core, xarray’s default implementations (i. e. no dask…) |
| 12 | +30 trials, minimum time is reported here |
| 13 | +Ubuntu 22.04, Julia 1.11, python 3.10.12, xarray 2024.12 |
| 14 | + |
| 15 | + |
| 16 | +Creation of the data file: |
| 17 | + |
| 18 | +```bash |
| 19 | +julia test_perf_init.jl |
| 20 | +``` |
| 21 | + |
| 22 | +Get root priviledges (to drop file cache) |
| 23 | + |
| 24 | +``` |
| 25 | +sudo -s |
| 26 | +export HOME=/home/abarth |
| 27 | +cd ~/.julia/dev/CommonDataModel/test/perf |
| 28 | +``` |
| 29 | + |
| 30 | +## Laptop with a i5-1135G7 CPU and NVMe SSD WDC WDS100T2B0C |
| 31 | + |
| 32 | + |
| 33 | +### CommonDataModel |
| 34 | + |
| 35 | +```bash |
| 36 | +~/.juliaup/bin/julia test_perf_cdm.jl |
| 37 | +``` |
| 38 | + |
| 39 | +Output: |
| 40 | + |
| 41 | +``` |
| 42 | +runtime of mean |
| 43 | + 2.133 s (1686528 allocations: 2.71 GiB) |
| 44 | +runtime of std |
| 45 | + 2.574 s (1686525 allocations: 2.72 GiB) |
| 46 | +accuracy |
| 47 | +sqrt(mean((gm - mean_ref) .^ 2)) = 4.643795867042341e-5 |
| 48 | +sqrt(mean((gs - std_ref) .^ 2)) = 9.251281717683748e-7 |
| 49 | +``` |
| 50 | + |
| 51 | + |
| 52 | +### xarray |
| 53 | + |
| 54 | +```bash |
| 55 | +python3 test_perf_xarray.py |
| 56 | +``` |
| 57 | + |
| 58 | +Output: |
| 59 | + |
| 60 | +``` |
| 61 | +python: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] |
| 62 | +xarray: 2024.10.0 |
| 63 | +numpy: 1.26.1 |
| 64 | +runtime |
| 65 | + minimum time of <function mean_no_cache at 0x741456563d90> : 4.260775363999983 |
| 66 | + minimum time of <function std_no_cache at 0x74143df69240> : 5.453749345000006 |
| 67 | +accuracy |
| 68 | + accuracy of mean 4.64379586704234e-05 |
| 69 | + accuracy of std 2.1211715725139616e-07 |
| 70 | +``` |
| 71 | + |
| 72 | + |
| 73 | +# Workstation with i7-7700 CPU and SATA SSD (WD Green 120G) |
| 74 | + |
| 75 | +``` |
| 76 | +~/.juliaup/bin/julia test_perf_init.jl |
| 77 | +~/.juliaup/bin/julia test_perf_cdm.jl |
| 78 | +python3 test_perf_xarray.py |
| 79 | +``` |
| 80 | + |
| 81 | +Output: |
| 82 | + |
| 83 | +``` |
| 84 | +runtime |
| 85 | + 7.177 s (1686528 allocations: 2.71 GiB) |
| 86 | + 8.090 s (1686525 allocations: 2.72 GiB) |
| 87 | +accuracy |
| 88 | +sqrt(mean((gm - mean_ref) .^ 2)) = 4.6300139982730906e-5 |
| 89 | +sqrt(mean((gs - std_ref) .^ 2)) = 9.268973317814482e-7 |
| 90 | +python: 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] |
| 91 | +xarray: 2024.10.0 |
| 92 | +numpy: 1.26.2 |
| 93 | +runtime |
| 94 | + minimum time of <function mean_no_cache at 0x7f54cb64bd90> : 8.740452307043597 |
| 95 | + minimum time of <function std_no_cache at 0x7f54b31a0a60> : 10.462690721964464 |
| 96 | +accuracy |
| 97 | + accuracy of mean 4.6300139982730906e-05 |
| 98 | + accuracy of std 2.12226305970758e-07 |
| 99 | +``` |
0 commit comments