kazarr

A lightweight FastAPI service that exposes endpoints to interact with Zarr datasets stored in a Simple Storage Service (S3):

a datasets endpoint to explore available multi-dimensional arrays,
an extraction endpoint to slice and dice data,
a probe endpoint to query specific values at given coordinates,
an isoline endpoint to compute contour lines dynamically.

API

Tip

You can find auto-generated documentation about API at endpoints /docs or /redoc

/health (GET)

Check for service's health, return a json object with a single member status.

/datasets (GET)

Return a list of all available Zarr datasets with their id and description.

/datasets/{dataset} (GET)

Return metadata (dimensions, variables, attributes) for a specific Zarr dataset. The dataset parameter is expected to be the dataset id, that can be found with the previous endpoint.

/datasets/{dataset}/extract (GET)

Extracts a subset of the data based on a bounding box and a specific variable.

Warning

Large extractions may impact performance. Be mindful of the bounding box size for high-resolution datasets.

The extract endpoint accepts the following query parameters:

Name	Description	Optional
`variable`	The variable to extract.	✗
`lon_min`	Minimum longitude of the bounding box.	✓
`lat_min`	Minimum latitude of the bounding box.	✓
`lon_max`	Maximum longitude of the bounding box.	✓
`lat_max`	Maximum latitude of the bounding box.	✓
`time`	The time value/slice to extract.	✓
`resolution_limit`	Limit the amount of data for lat/lon axis (decimate)	✓
`format`	Format of the extracted data (Supported: `raw`, `geojson`). Ignored when `mesh_tile_size` is defined	✓
`mesh_tile_size`	Return data as mesh, with a grid of `mesh_tile_size`x`mesh_tile_size`	✓
`mesh_interpolate`	Apply interpolation to mesh values	✓
`as_dims`	If a variable has the same name as a dim, force query parameters in this list to be treated as dimensions	✓

Important

You may need to specify additional non-generic variables or dimensions according to your dataset. To do so, you can add query parameters with &my_additional_variable={VALUE}

/datasets/{dataset}/probe (GET)

Retrieves the values of specified variables at a specific geographical location (point query).

The probe endpoint accepts the following query parameters:

Name	Description	Optional
`variables`	The list of variables to probe.	✗
`lon`	The longitude coordinate to probe.	✗
`lat`	The latitude coordinate to probe.	✗
`height`	The height coordinate to probe (if 3D data).	✓
`as_dims`	If a variable has the same name as a dim, force query parameters in this list to be treated as dimensions	✓

Important

You may need to specify additional non-generic variables or dimensions according to your dataset. To do so, you can add query parameters with &my_additional_variable={VALUE}

Tip

You can request multiple variables at once by repeating the variables parameter in the query string (e.g., ?variables=temp&variables=wind).

/datasets/{dataset}/isoline (GET)

Computes isolines (contour lines) for a given variable and specific levels.

The isoline endpoint accepts the following query parameters:

Name	Description	Optional
`variable`	The variable to generate isolines for.	✗
`levels`	Comma-separated list of levels for isoline generation.	✗
`time`	The time value to use for isoline generation.	✓
`format`	Format of the extracted data (Supported: `raw`, `geojson`). Ignored when `mesh_tile_size` is defined	✓
`as_dims`	If a variable has the same name as a dim, force query parameters in this list to be treated as dimensions	✓

Important

You may need to specify additional non-generic variables or dimensions according to your dataset. To do so, you can add query parameters with &my_additional_variable={VALUE}

/datasets/{dataset}/select (GET)

You can select data in a generic way with this endpoint, as raw multi-dimensional arrays. If you just specify the variable parameter, you will get all the data, but you are free to add additional parameters to fix some variables/dimensions

The select endpoint accepts the following query parameters:

Name	Description	Optional
`variable`	The variable from which you want to select the data.	✗
`as_dims`	If a variable has the same name as a dim, force query parameters in this list to be treated as dimensions	✓

Configuring

Environment variables

Variable	Description	Default value
PORT	The port to be used when exposing the service	8000
HOSTNAME	The hostname to be used when exposing the service	localhost
AWS_ACCESS_KEY_ID	Access key ID of the S3 in which zarr data is stored
AWS_SECRET_ACCESS_KEY	Secret access key of the S3 in which zarr data is stored
AWS_DEFAULT_REGION	Region of the S3 in which zarr data is stored
AWS_ENDPOINT_URL	Endpoint URL of the S3 in which zarr data is stored
BUCKET_NAME	The name of the bucket in which zarr data is stored
DATASETS_PATH	Path to the JSON file containing datasets description	datasets.json
CACHE_DIR	Path to the directory where cache will be stored. Cache will not be used if this variable is not provided
CACHE_SIZE	Max size of cache folder (e.g. 1024KB, 512MB, 4GB)	512MB

Important

With some S3 provider, some errors about checksum calculation can occur (error: botocore.exceptions.ClientError: An error occurred (InvalidArgument) when calling the PutObject operation: x-amz-content-sha256 must be UNSIGNED-PAYLOAD, or a valid sha256 value.). In that case, you should set AWS_REQUEST_CHECKSUM_CALCULATION environment variable to when_required

Usage

Manual build

You can build the image with the following command:

docker build -t <your-image-name> .

And then start the service with:

docker run -p 8000:8000 <your-image-name>

Run locally

You will need to install multiple Python packages to run this app. To simplify, you can install Anaconda (or micromamba) and run these commands :

conda create -y -n kazarr_env python=3.11

conda install -y -n kazarr_env -c conda-forge \
  fastapi \
  uvicorn \
  xarray \
  zarr \
  cfgrib \
  numpy \
  pyproj \
  dask \
  s3fs \
  matplotlib \
  pyvista \
  scipy

conda activate kazarr_env

python main.py

Local S3

You can run a local object storage with S3-compliant API using garage with CLI access using s3cmd (pipx install s3cmd).

First, generate a secret with openssl rand -base64 32 and create a garage configuration file:

metadata_dir = "/home/luc/Development/GeoData/s3-meta"
data_dir = "/home/luc/Development/GeoData/s3"
db_engine = "sqlite"

replication_factor = 1

rpc_bind_addr = "[::]:3901"
rpc_public_addr = "127.0.0.1:3901"
rpc_secret = "your secret"

[s3_api]
s3_region = "localhost"
api_bind_addr = "[::]:3900"
root_domain = ".s3.garage.localhost"

[s3_web]
bind_addr = "[::]:3902"
root_domain = ".web.garage.localhost"
index = "index.html"

Then launch the server with garage -c ./garage.toml server in a terminal and get your node ID in another terminal with garage -c ./garage.toml status.

Create the layout of your cluster with garage -c ./garage.toml layout assign -z localhost -c 500G nodeID && garage -c ./garage.toml layout apply --version 1.

Create a bucket with garage -c ./garage.toml bucket create zarr-data.

Create an access key with garage -c ./garage.toml key create zarr-data-key.

Allow the key to access your bucket garage -c ./garage.toml bucket allow --read --write --owner zarr-data --key zarr-data-key.

Create a s3cmd configuration file:

[default]
access_key = your-key-id
secret_key = your-key-secret
host_base = http://localhost:3900
host_bucket = http://localhost:3900
use_https = False

Then synchronize any data from your local file system to garage with s3cmd -c ./s3cmd.cfg sync ./zarr-data/ s3://zarr-data.

Contributing

Please read the Contributing file for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Preparing Datasets

An extra tool allow you to generate Zarr datasets from NetCDF or GRIB2 files. For more detail, check the conversion tool

Authors

This project is sponsored by

License

This project is licensed under the MIT License - see the license file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
conversion_tool		conversion_tool
scripts		scripts
src		src
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kazarr

API

/health (GET)

/datasets (GET)

/datasets/{dataset} (GET)

/datasets/{dataset}/extract (GET)

/datasets/{dataset}/probe (GET)

/datasets/{dataset}/isoline (GET)

/datasets/{dataset}/select (GET)

Configuring

Environment variables

Usage

Manual build

Run locally

Local S3

Contributing

Versioning

Preparing Datasets

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

kalisio/kazarr

Folders and files

Latest commit

History

Repository files navigation

kazarr

API

/health (GET)

/datasets (GET)

/datasets/{dataset} (GET)

/datasets/{dataset}/extract (GET)

/datasets/{dataset}/probe (GET)

/datasets/{dataset}/isoline (GET)

/datasets/{dataset}/select (GET)

Configuring

Environment variables

Usage

Manual build

Run locally

Local S3

Contributing

Versioning

Preparing Datasets

Authors

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages