Skip to content

Commit c72070b

Browse files
dcherianweiji14andersy005pre-commit-ci[bot]
authored
Add kvikio blogpost (#308)
Co-authored-by: Wei Ji <[email protected]> Co-authored-by: Anderson Banihirwe <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent bd985a5 commit c72070b

File tree

2 files changed

+178
-0
lines changed

2 files changed

+178
-0
lines changed

public/cards/xarray-kvikio.png

229 KB
Loading

src/posts/xarray-kvikio/index.md

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
---
2+
title: 'Enabling GPU-native analytics with Xarray and kvikIO'
3+
date: '2022-08-30'
4+
authors:
5+
- name: Deepak Cherian
6+
github: dcherian
7+
- name: Wei Ji Leong
8+
github: weiji14
9+
summary: 'An experiment with direct-to-GPU reads from a Zarr store using Xarray.'
10+
---
11+
12+
## TLDR
13+
14+
We [demonstrate](https://github.com/xarray-contrib/cupy-xarray/pull/10) registering an Xarray backend that reads data from a Zarr store directly to GPU memory as [CuPy arrays](https://cupy.dev) using the new [kvikIO library](https://docs.rapids.ai/api/kvikio/stable/) and [GPU Direct Storage](https://developer.nvidia.com/blog/gpudirect-storage/) technology. This allows direct-to-GPU reads and GPU-native analytics on existing pipelines 🎉 😱 🤯 🥳.
15+
16+
## Background
17+
18+
### What is GPU Direct Storage?
19+
20+
Quoting [this nVIDIA blogpost](https://developer.nvidia.com/blog/gpudirect-storage/)
21+
22+
> I/O, the process of loading data from storage to GPUs for processing, has historically been controlled by the CPU. As computation shifts from slower CPUs to faster GPUs, I/O becomes more of a bottleneck to overall application performance. Just as GPUDirect RDMA (Remote Direct Memory Address) improved bandwidth and latency when moving data directly between a network interface card (NIC) and GPU memory, a new technology called GPUDirect Storage enables a direct data path between local or remote storage, like NVMe or NVMe over Fabric (NVMe-oF), and GPU memory. Both GPUDirect RDMA and GPUDirect Storage avoid extra copies through a bounce buffer in the CPU’s memory and enable a direct memory access (DMA) engine near the NIC or storage to move data on a direct path into or out of GPU memory, all without burdening the CPU or GPU. For GPUDirect Storage, storage location doesn’t matter; it could be inside an enclosure, within the rack, or connected over the network.
23+
24+
![Diagram showing standard path between GPU memory and CPU memory on the left, versus a direct data path between GPU memory and storage on the right](https://developer.nvidia.com/blog/wp-content/uploads/2019/08/GPUDirect-Fig-1-New.png)
25+
26+
### What is kvikIO?
27+
28+
> [kvikIO](https://github.com/rapidsai/kvikio) is a Python library providing bindings to [cuFile](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html#introduction), which enables GPUDirectStorage (GDS).
29+
30+
For Xarray, the key bit is that kvikIO exposes a [a zarr store](https://docs.rapids.ai/api/kvikio/stable/api.html#zarr) called `GDSStore` that does all the hard work for us. Since Xarray knows how to read Zarr stores, we can adapt it to create a new storage backend that uses `kvikio`. And thanks to recent work funded by the [Chan Zuckerberg Initiative](https://xarray.dev/blog/czi-eoss-grant-conclusion), creating and registering a [new backend](https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html) is quite easy!
31+
32+
## Integrating with Xarray
33+
34+
Getting all these pieces to work together requires using three in-progress pull requests that
35+
36+
1. [Teach Zarr to handle alternative array classes](https://github.com/zarr-developers/zarr-python/pull/934)
37+
2. [Rewrite a small bit of Xarray to not cast all data to a numpy array after read from disk](https://github.com/pydata/xarray/pull/6874)
38+
3. [Make a backend that connects Xarray to kvikIO](https://github.com/xarray-contrib/cupy-xarray/pull/10)
39+
40+
Writing the backend for Xarray was relatively easy with most of the code copied over or inherited from the existing Zarr backend. We did have to ensure that dimension coordinates (for example, a `time` dimension with timestamps for a timeseries dataset) could be read in directly to host memory (RAM) without raising an error (by default kvikIO loads all data to device memory). This is required because Xarrays creates `pandas.Index` objects for such variables. In the future, we could consider using `cudf.Index` instead to allow a fully GPU-backed Xarray object.
41+
42+
## Usage
43+
44+
Assuming you have all the pieces together (see [Appendix I](#appendix-i--step-by-step-install-instructions) and [Appendix II](#appendix-ii--making-sure-gds-is-working) for step-by-step instructions), then using all this cool technology only requires adding `engine="kvikio"` to your `open_dataset` line (!)
45+
46+
```python
47+
import xarray as xr
48+
49+
ds = xr.open_dataset("file.zarr", engine="kvikio", consolidated=False)
50+
```
51+
52+
Notice that importing `cupy_xarray` was not needed. `cupy_xarray` uses [entrypoints](https://packaging.python.org/en/latest/specifications/entry-points/) to register the kvikIO backend with Xarray.
53+
54+
With this `ds.load()` will load directly to GPU memory and `ds` will now contain CuPy arrays. At present there are a few limitations:
55+
56+
1. Zarr stores cannot be read with consolidated metadata, and
57+
2. compression is unsupported by the kvikIO backend.
58+
59+
## Quick demo
60+
61+
First create an example uncompressed dataset to read from
62+
63+
```python
64+
import xarray as xr
65+
66+
store = "./air-temperature.zarr"
67+
68+
airt = xr.tutorial.open_dataset("air_temperature", engine="netcdf4")
69+
70+
for var in airt.variables:
71+
airt[var].encoding["compressor"] = None
72+
airt.to_zarr(store, mode="w", consolidated=True)
73+
```
74+
75+
Now read
76+
77+
```python
78+
# consolidated must be False
79+
ds = xr.open_dataset(store, engine="kvikio", consolidated=False)
80+
ds.air
81+
```
82+
83+
```python
84+
<xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)>
85+
[3869000 values with dtype=float32]
86+
Coordinates:
87+
* lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
88+
* lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
89+
* time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
90+
Attributes:
91+
GRIB_id: 11
92+
GRIB_name: TMP
93+
actual_range: [185.16000366210938, 322.1000061035156]
94+
dataset: NMC Reanalysis
95+
level_desc: Surface
96+
long_name: 4xDaily Air temperature at sigma level 995
97+
parent_stat: Other
98+
precision: 2
99+
statistic: Individual Obs
100+
units: degK
101+
var_desc: Air temperature
102+
```
103+
104+
Note that we get Xarray's lazy backend arrays by default, and that dimension coordinate variables `lat`, `lon`, `time` were read. At this point this looks identical to what we get with a standard `xr.open_dataset(store, engine="zarr")` command.
105+
106+
Now load a small subset
107+
108+
```python
109+
type(ds["air"].isel(time=0, lat=10).load().data)
110+
```
111+
112+
```
113+
cupy._core.core.ndarray
114+
```
115+
116+
Success! 🎉 😱 🤯 🥳
117+
118+
Xarray integrates [decently well](https://cupy-xarray.readthedocs.io/quickstart.html) with CuPy arrays so you should be able to test out existing analysis pipelines pretty easily.
119+
120+
## Cool demo
121+
122+
See above! 😆 We don't have a more extensive analysis demo yet but are looking to develop one very soon! The limiting step here is access to capable hardware.
123+
124+
Reach out [on the Pangeo discourse forum](https://discourse.pangeo.io/tag/machine-learning) or over at [cupy-xarray](https://github.com/xarray-contrib/cupy-xarray) if you have ideas. We would love to hear from you.
125+
126+
## Summary
127+
128+
We demonstrate integrating the kvikIO library using Xarray's new backend entrypoints. With everything set up, simply adding `engine="kvikio"` enables direct-to-GPU reads from disk or over the network.
129+
130+
## Acknowledgments
131+
132+
This experiment was supported by funding from NASA-OSTFL 80NSSC22K0345 "Enhancing analysis of NASA data with the open-source Python Xarray Library".
133+
134+
## Appendix I : Step-by-step install instructions
135+
136+
[Wei Ji Leong](https://github.com/weiji14) helpfully [provided steps](https://github.com/xarray-contrib/cupy-xarray/pull/10#issuecomment-1218374773) to get started on your machine:
137+
138+
```bash
139+
# May need to install nvidia-gds first
140+
# https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu-installation-common
141+
sudo apt install nvidia-gds
142+
143+
git clone https://github.com/dcherian/cupy-xarray.git
144+
cd cupy-xarray
145+
146+
mamba create --name cupy-xarray python=3.9 cupy=11.0 rapidsai-nightly::kvikio=22.10 jupyterlab=3.4.5 pooch=1.6.0 netcdf4=1.6.0 watermark=2.3.1
147+
mamba activate cupy-xarray
148+
python -m ipykernel install --user --name cupy-xarray
149+
150+
# https://github.com/pydata/xarray/pull/6874
151+
pip install git+https://github.com/dcherian/xarray.git@kvikio
152+
# https://github.com/zarr-developers/zarr-python/pull/934
153+
pip install git+https://github.com/madsbk/zarr-python.git@cupy_support
154+
# https://github.com/xarray-contrib/cupy-xarray/pull/10
155+
git switch kvikio-entrypoint
156+
pip install --editable=.
157+
158+
# Start jupyter lab
159+
jupyter lab --no-browser
160+
# Then open the docs/kvikio.ipynb notebook
161+
```
162+
163+
## Appendix II : making sure GDS is working
164+
165+
[Scott Henderson](https://github.com/scottyhq) pointed out that running `python kvikio/python/benchmarks/single-node-io.py` prints nice diagnostic information that lets you check whether GDS is set up. Note that on our system, we have "compatibility mode" enabled. So we don't see the benefits now but this was enough to wire everything up.
166+
167+
```
168+
----------------------------------
169+
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
170+
WARNING - KvikIO compat mode
171+
libcufile.so not used
172+
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
173+
GPU | Quadro GP100 (dev #0)
174+
GPU Memory Total | 16.00 GiB
175+
BAR1 Memory Total | 256.00 MiB
176+
GDS driver | N/A (Compatibility Mode)
177+
GDS config.json | /etc/cufile.json
178+
```

0 commit comments

Comments
 (0)