-
Notifications
You must be signed in to change notification settings - Fork 28
Add kvikio blogpost #308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kvikio blogpost #308
Changes from 6 commits
c06013c
b8d357f
089b341
05ba2be
97d3d05
44d46c7
c1b09c2
804c262
4a09f34
0f579a6
f98f3b5
6d46410
56589f9
7eada73
dbd8880
a709f41
ebda9ce
eb2c4d9
775f3cb
27f284b
aa75904
edd7720
60d6289
ebd7c1d
d83e796
0942ed9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
--- | ||
title: "Enabling GPU-native analytics with Xarray and kvikIO" | ||
date: "2022-08-25" | ||
authors: | ||
- name: Deepak Cherian | ||
github: dcherian | ||
summary: "An experiment with direct-to-GPU reads from a Zarr store using Xarray." | ||
--- | ||
|
||
## TLDR | ||
|
||
We [demonstrate](https://github.com/xarray-contrib/cupy-xarray/pull/10) registering an Xarray backend that reads data from a Zarr store directly to GPU memory as [CuPy arrays](https://cupy.dev) using the new [kvikIO library](https://docs.rapids.ai/api/kvikio/stable/) and [GPU Direct Storage](https://developer.nvidia.com/blog/gpudirect-storage/) technology. | ||
|
||
## Background | ||
|
||
### What is GPU Direct Storage | ||
|
||
Quoting [this nVIDIA blogpost](https://developer.nvidia.com/blog/gpudirect-storage/) | ||
|
||
> I/O, the process of loading data from storage to GPUs for processing, has historically been controlled by the CPU. As computation shifts from slower CPUs to faster GPUs, I/O becomes more of a bottleneck to overall application performance. | ||
> Just as GPUDirect RDMA (Remote Direct Memory Address) improved bandwidth and latency when moving data directly between a network interface card (NIC) and GPU memory, a new technology called GPUDirect Storage enables a direct data path between local or remote storage, like NVMe or NVMe over Fabric (NVMe-oF), and GPU memory. | ||
> Both GPUDirect RDMA and GPUDirect Storage avoid extra copies through a bounce buffer in the CPU’s memory and enable a direct memory access (DMA) engine near the NIC or storage to move data on a direct path into or out of GPU memory, all without burdening the CPU or GPU | ||
> For GPUDirect Storage, storage location doesn’t matter; it could be inside an enclosure, within the rack, or connected over the network. | ||
|
||
<p align = "center"> | ||
<img src = "https://developer.nvidia.com/blog/wp-content/uploads/2019/08/GPUDirect-Fig-1-New.png" /> | ||
</p> | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
### What is kvikIO | ||
|
||
> kvikIO is a Python library providing bindings to cuFile, which enables GPUDirectStorage (GDS). | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
For Xarray, the key bit is that kvikIO exposes a zarr store [`kvikio.zarr.GDSStore`](https://docs.rapids.ai/api/kvikio/stable/api.html#zarr) that does all the hard work for us. Since Xarray knows how to read Zarr stores, we can adapt that to create a new storage backend that uses `kvikio`. And thanks to recent work funded by the Chan Zuckerberg Initiative, creating and registering a [new backend](https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html) is quite easy! | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
## Integrating with Xarray | ||
|
||
Getting all this to work nicely requires using three in-progress pull requests that | ||
|
||
1. [Teach Zarr to handle alternative array classes](https://github.com/zarr-developers/zarr-python/pull/934) | ||
2. [Rewrite a small bit of Xarray to not cast all data to a numpy array after read from disk](https://github.com/pydata/xarray/pull/6874) | ||
3. [Make a backend that connects Xarray to kvikIO](https://github.com/xarray-contrib/cupy-xarray/pull/10) | ||
|
||
Writing the backend for Xarray was relatively easily. Most of the code was copied over from the existing Zarr backend. Most of the effort was in ensuring that dimension coordinates could be read in directly to host memory without raising an error. This is required because Xarrays creates `pandas.Index` objects for such variables. In the future, we could consider using `cudf.Index` instead to allow a fully GPU-backed Xarray object. | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
## Usage | ||
|
||
Assuming you have all the pieces together (see [Appendix I]() and [Appendix II]() for step-by-step instructions), then using all this cool technology only requires adding `engine="kvikio"` to your `open_dataset` line (!) | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
```python | ||
import xarray as xr | ||
|
||
ds = xr.open_dataset("file.zarr", engine="kvikio", consolidated=False) | ||
``` | ||
|
||
Notice that importing `cupy_xarray` was not needed. `cupy_xarray` uses [entrypoints](https://packaging.python.org/en/latest/specifications/entry-points/) to register the kvikIO backend with Xarray. | ||
|
||
With this `ds.load()` will load directly to GPU memory and `ds` will now contain CuPy arrays. At present there are a few limitations: | ||
|
||
1. stores cannot be read with consolidated metadata, and | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
2. compression is unsupported by the backend. | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
## Quick demo | ||
|
||
First create an example uncompressed dataset to read from | ||
|
||
``` | ||
import xarray as xr | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
store = "./air-temperature.zarr" | ||
|
||
airt = xr.tutorial.open_dataset("air_temperature", engine="netcdf4") | ||
|
||
for var in airt.variables: | ||
airt[var].encoding["compressor"] = None | ||
airt.to_zarr(store, mode="w", consolidated=True) | ||
``` | ||
|
||
Now read | ||
|
||
``` | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
# consolidated must be False | ||
ds = xr.open_dataset(store, engine="kvikio", consolidated=False) | ||
ds.air | ||
``` | ||
|
||
``` | ||
<xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)> | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
[3869000 values with dtype=float32] | ||
Coordinates: | ||
* lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 | ||
* lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 | ||
* time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 | ||
Attributes: | ||
GRIB_id: 11 | ||
GRIB_name: TMP | ||
actual_range: [185.16000366210938, 322.1000061035156] | ||
dataset: NMC Reanalysis | ||
level_desc: Surface | ||
long_name: 4xDaily Air temperature at sigma level 995 | ||
parent_stat: Other | ||
precision: 2 | ||
statistic: Individual Obs | ||
units: degK | ||
var_desc: Air temperature | ||
``` | ||
|
||
Note that we get Xarray's lazy backend arrays by default, and that dimension coordinate variables `lat`, `lon`, `time` were read. At this point this looks identical to what we get with a standard `xr.open_dataset(store, engine="zarr")` command. | ||
|
||
Now load a small subset | ||
|
||
```python | ||
type(ds["air"].isel(time=0, lat=10).load().data) | ||
``` | ||
|
||
``` | ||
cupy._core.core.ndarray | ||
``` | ||
|
||
Success! | ||
|
||
Xarray integrates [decently well](https://cupy-xarray.readthedocs.io/quickstart.html) with CuPy arrays so you should be able to test out analysis pipelines pretty easily. | ||
|
||
## Cool demo | ||
|
||
We don't have a cool demo yet but are looking to develop one very soon! | ||
|
||
|
||
Reach out if you have ideas. We would love to hear from you. | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
## Summary | ||
|
||
We demonstrate integrating the kvikIO library using Xarray's new backend entrypoints. With everything set up, simply adding `engine="kvikio"` enables direct-to-GPU reads from disk or over the network. | ||
|
||
## Acknowledgments | ||
|
||
My time on this project was funded by NASA-OSTFL 80NSSC22K0345 "Enhancing analysis of NASA data with the open-source Python Xarray Library" | ||
|
||
## Appendix I : Step-by-step install instructions | ||
|
||
[Wei Ji Leong](https://github.com/weiji14) helpfully [provided steps](https://discourse.pangeo.io/t/favorite-way-to-go-from-netcdf-xarray-to-torch-tf-jax-et-al/2663/2) to get started on your machine: | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
``` | ||
dcherian marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
# May need to install nvidia-gds first | ||
# https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu-installation-common | ||
sudo apt install nvidia-gds | ||
|
||
git clone https://github.com/dcherian/cupy-xarray.git | ||
cd cupy-xarray | ||
|
||
mamba create --name cupy-xarray python=3.9 cupy=11.0 rapidsai-nightly::kvikio=22.10 jupyterlab=3.4.5 pooch=1.6.0 netcdf4=1.6.0 watermark=2.3.1 | ||
mamba activate cupy-xarray | ||
python -m ipykernel install --user --name cupy-xarray | ||
|
||
# https://github.com/pydata/xarray/pull/6874 | ||
pip install git+https://github.com/dcherian/xarray.git@kvikio | ||
# https://github.com/zarr-developers/zarr-python/pull/934 | ||
pip install git+https://github.com/madsbk/zarr-python.git@cupy_support | ||
# https://github.com/xarray-contrib/cupy-xarray/pull/10 | ||
git switch kvikio-entrypoint | ||
pip install --editable=. | ||
|
||
# Start jupyter lab | ||
jupyter lab --no-browser | ||
# Then open the docs/kvikio.ipynb notebook | ||
``` | ||
|
||
## Appendix II : making sure GDS is working | ||
|
||
[Scott Henderson](https://github.com/scottyhq) pointed out that running `python kvikio/python/benchmarks/single-node-io.py` prints nice diagnostic information that lets you check whether GDS is set up. Note that on our system, we have "compatibility mode" enabled. So we don't see the benefits now but this was enough to wire everything up. | ||
|
||
``` | ||
---------------------------------- | ||
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | ||
WARNING - KvikIO compat mode | ||
libcufile.so not used | ||
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | ||
GPU | Quadro GP100 (dev #0) | ||
GPU Memory Total | 16.00 GiB | ||
BAR1 Memory Total | 256.00 MiB | ||
GDS driver | N/A (Compatibility Mode) | ||
GDS config.json | /etc/cufile.json | ||
``` |
Uh oh!
There was an error while loading. Please reload this page.