You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fast and efficient access to Cloud-Optimized GeoTIFFs (COGs), optimized for Sentinel-2 and Landsat data.
3
+
Faster querying of Cloud-Optimized GeoTIFFs (COGs) with lower HTTP requests in your workflows, currently tested for Sentinel-2 and Landsat COG files.
4
4
5
5
> [!WARNING]
6
6
> Work-in-progress library. The APIs are subject to change, and as such, documentation is not yet available.
7
7
8
+
## Table of Contents
9
+
-[Features](#-features)
10
+
-[Why Rasteret?](#why-this-library)
11
+
-[Built-in Data Sources](#-built-in-data-sources)
12
+
-[Prerequisites](#-prerequisites)
13
+
-[Installation](#-installation)
14
+
-[Quick Start](#-quick-start)
15
+
-[License](#-license)
16
+
-[Contributing](#-contributing)
17
+
18
+
---
19
+
8
20
## 🚀 Features
9
21
- Fast byte-range based COG access
10
-
- STAC Geoparquet creation with COG internal metadata columns
11
-
- Paid public data support (AWS S3 Landsat)
22
+
- STAC Geoparquet creation with COG header metadata
23
+
- Paid S3 bucket support (AWS S3 Landsat)
12
24
- Xarray and GeoDataFrame outputs
13
25
- Parallel data loading
14
26
- Simple high-level API
15
27
28
+
---
29
+
30
+
## Why this library?
31
+
32
+
### 💡 The Problem
33
+
34
+
Currently satellite image access requires multiple HTTP requests:
35
+
- Initial request to read COG headers
36
+
- Additional requests if headers are split
37
+
- Final requests for actual data tiles
38
+
- These requests repeat in new environments:
39
+
- New Python environments (like inside parallel Lambdas/ parallel Docker Containers in k8s)
40
+
- Or in local environment when GDAL cache is cleared (like a Jupyter kernel restart / Laptop restart)
41
+
42
+
### ✨ Rasteret's Solution
43
+
44
+
Rasteret reimagines how we access cloud-hosted satellite imagery by:
45
+
- Creating local 'collections' with pre-cached COG file headers along with STAC metadata
46
+
- Calculating exact byte ranges for image tiles needed, without header requests
47
+
- Making single optimized HTTP request per required tile
48
+
- Ensuring COG file headers are never re-read across new Python environments
49
+
50
+
### 📊 Performance Benchmarks
51
+
52
+
#### Speed Benchmarks
53
+
54
+
Test setup: Filter 1 year of STAC (100+ scenes), process 20 Sentinel-2 filtered scenes over an agricultural area, accessing RED and NIR bands (40 COG files total)
Details on why this library was made, and how it reads multiple COGs efficiently and fast -
157
-
[Read the blog post here](https://blog.terrafloww.com/efficient-cloud-native-raster-data-access-an-alternative-to-rasterio-gdal/)
158
-
159
-
The aim of this library is to reduce the number of API calls to S3 objects (COGs), which
160
-
will result in lesser time consumed for random file access and hence faster time series analysis without needing to convert COGs to other formats like Zarr or NetCDF.
161
-
162
-
It also reduces the cost incurred by readers of paid data sources like Landsat on AWS where GET and LIST requests are significantly reduced due to local collection of COG internal metadata.
0 commit comments