|
4 | 4 |
|
5 | 5 | _An Xarray extension for Google Earth Engine._ |
6 | 6 |
|
| 7 | +Xee bridges the gap between Google Earth Engine's massive data catalog and the scientific Python ecosystem. It provides a custom Xarray backend that allows you to open any `ee.ImageCollection` as if it were a local `xarray.Dataset`. Data is loaded lazily and in parallel, enabling you to work with petabyte-scale archives of satellite and climate data using the power and flexibility of Xarray and its integrations with libraries like Dask. |
| 8 | + |
7 | 9 | [](https://pypi.python.org/pypi/xee) |
8 | 10 | [](https://pepy.tech/project/xee) |
9 | 11 | [![Conda |
@@ -32,85 +34,206 @@ Then, authenticate Earth Engine: |
32 | 34 | earthengine authenticate --quiet |
33 | 35 | ``` |
34 | 36 |
|
35 | | -Now, in your Python environment, make the following imports: |
| 37 | +Now, in your Python environment, make the following imports and initialize the Earth Engine client with your project ID. Using the high-volume API endpoint is recommended. |
36 | 38 |
|
37 | 39 | ```python |
38 | 40 | import ee |
39 | | -import xarray |
| 41 | +import xarray as xr |
| 42 | +from xee import helpers |
| 43 | +import shapely |
| 44 | + |
| 45 | +ee.Initialize( |
| 46 | + project='PROJECT-ID', # Replace with your project ID |
| 47 | + opt_url='https://earthengine-highvolume.googleapis.com' |
| 48 | +) |
40 | 49 | ``` |
41 | 50 |
|
42 | | -Next, specify your EE-registered cloud project ID and initialize the EE client |
43 | | -with the high volume API: |
| 51 | +### Specifying the Output Grid |
| 52 | + |
| 53 | +To open a dataset, you must specify the desired output pixel grid. The `xee.helpers` module simplifies this process by providing several convenient workflows, summarized below. |
| 54 | + |
| 55 | +| Goal | Method | When to Use | |
| 56 | +| :--- | :--- | :--- | |
| 57 | +| **Match Source Grid** | Use `helpers.extract_grid_params()` to get the parameters from an EE object. | When you want the data in its original, default projection and scale. | |
| 58 | +| **Fit Area to a Shape** | Use `helpers.fit_geometry()` with the `geometry` and `grid_shape` arguments. | When you need a consistent output array size (e.g., for ML models) and the exact pixel size is less important. | |
| 59 | +| **Fit Area to a Scale** | Use `helpers.fit_geometry()` with the `geometry` and `grid_scale` arguments. | When the specific resolution (e.g., 30 meters, 0.01 degrees) is critical for your analysis. | |
| 60 | +| **Manual Override** | Pass `crs`, `crs_transform`, and `shape_2d` directly to `xr.open_dataset`. | For advanced cases where you already have an exact grid definition. | |
| 61 | + |
| 62 | +> **Important Note on Units:** All grid parameter values must be in the units of the specified Coordinate Reference System (`crs`). |
| 63 | +> * For a geographic CRS like `'EPSG:4326'`, the units are in **degrees**. |
| 64 | +> * For a projected CRS like `'EPSG:32610'` (UTM), the units are in **meters**. |
| 65 | +> This applies to the translation values in `crs_transform` and the pixel sizes in `grid_scale`. |
| 66 | +
|
| 67 | +### Usage Examples |
| 68 | + |
| 69 | +Here are common workflows for opening datasets with `xee`, corresponding to the methods in the table above. |
| 70 | + |
| 71 | +#### Match Source Grid |
| 72 | + |
| 73 | +This is the simplest case, using `helpers.extract_grid_params` to match the dataset's default grid. |
44 | 74 |
|
45 | 75 | ```python |
46 | | -ee.Initialize( |
47 | | - project='my-project-id' |
48 | | - opt_url='https://earthengine-highvolume.googleapis.com') |
| 76 | +ic = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR') |
| 77 | +grid_params = helpers.extract_grid_params(ic) |
| 78 | +ds = xr.open_dataset(ic, engine='ee', **grid_params) |
49 | 79 | ``` |
50 | 80 |
|
51 | | -Open any Earth Engine ImageCollection by specifying the Xarray engine as `'ee'`: |
| 81 | +#### Fit Area to a Shape |
| 82 | + |
| 83 | +Define a grid over an area of interest by specifying the number of pixels. `helpers.fit_geometry` will calculate the correct `crs_transform`. |
52 | 84 |
|
53 | 85 | ```python |
54 | | -ds = xarray.open_dataset('ee://ECMWF/ERA5_LAND/HOURLY', engine='ee') |
| 86 | +aoi = shapely.geometry.box(113.33, -43.63, 153.56, -10.66) # Australia |
| 87 | +grid_params = helpers.fit_geometry( |
| 88 | + geometry=aoi, |
| 89 | + grid_crs='EPSG:4326', |
| 90 | + grid_shape=(256, 256) |
| 91 | +) |
| 92 | + |
| 93 | +ds = xr.open_dataset('ee://ECMWF/ERA5_LAND/MONTHLY_AGGR', engine='ee', **grid_params) |
55 | 94 | ``` |
56 | 95 |
|
57 | | -Open all bands in a specific projection (not the Xee default): |
| 96 | +#### Fit Area to a Scale (Resolution) |
| 97 | + |
| 98 | +> **A Note on `grid_scale` and Y-Scale Orientation** |
| 99 | +> When using `fit_geometry` with `grid_scale`, you are defining both the pixel size and the grid's orientation via the sign of the y-scale. |
| 100 | +> * A **negative `y_scale`** (e.g., `(10000, -10000)`) is the standard for "north-up" satellite and aerial imagery, creating a grid with a **top-left** origin. |
| 101 | +> * A **positive `y_scale`** (e.g., `(10000, 10000)`) is used by some datasets and creates a grid with a **bottom-left** origin. |
| 102 | +> You may need to inspect your source dataset's projection information to determine the correct sign to use. If you use `grid_shape`, a standard negative y-scale is assumed. |
| 103 | +
|
| 104 | +The following example defines a grid over an area by specifying the pixel size in meters. `fit_geometry` will reproject the geometry and calculate the correct `shape_2d`. |
58 | 105 |
|
59 | 106 | ```python |
60 | | -ds = xarray.open_dataset('ee://ECMWF/ERA5_LAND/HOURLY', engine='ee', |
61 | | - crs='EPSG:4326', scale=0.25) |
| 107 | +aoi = shapely.geometry.box(113.33, -43.63, 153.56, -10.66) # Australia |
| 108 | +grid_params = helpers.fit_geometry( |
| 109 | + geometry=aoi, |
| 110 | + geometry_crs='EPSG:4326', # CRS of the input geometry |
| 111 | + grid_crs='EPSG:32662', # Target CRS in meters (Plate Carrée) |
| 112 | + grid_scale=(10000, -10000) # Define a 10km pixel size |
| 113 | +) |
| 114 | + |
| 115 | +ds = xr.open_dataset('ee://ECMWF/ERA5_LAND/MONTHLY_AGGR', engine='ee', **grid_params) |
62 | 116 | ``` |
63 | 117 |
|
64 | | -Open an ImageCollection (maybe, with EE-side filtering or processing): |
| 118 | +#### Open a Custom Region at Source Resolution |
| 119 | + |
| 120 | +This workflow is ideal for analyzing a specific area while maintaining the dataset's original resolution. |
65 | 121 |
|
66 | 122 | ```python |
67 | | -ic = ee.ImageCollection('ECMWF/ERA5_LAND/HOURLY').filterDate( |
68 | | - '1992-10-05', '1993-03-31') |
69 | | -ds = xarray.open_dataset(ic, engine='ee', crs='EPSG:4326', scale=0.25) |
| 123 | +# 1. Get the original grid parameters from the target ImageCollection |
| 124 | +ic = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR') |
| 125 | +source_params = helpers.extract_grid_params(ic) |
| 126 | + |
| 127 | +# 2. Extract the source CRS and scale |
| 128 | +source_crs = source_params['crs'] |
| 129 | +source_transform = source_params['crs_transform'] |
| 130 | +source_scale = (source_transform[0], source_transform[4]) # (x_scale, y_scale) |
| 131 | + |
| 132 | +# 3. Use the source parameters to fit the grid to a specific geometry |
| 133 | +aoi = shapely.geometry.box(113.33, -43.63, 153.56, -10.66) # Australia |
| 134 | +final_grid_params = helpers.fit_geometry( |
| 135 | + geometry=aoi, |
| 136 | + geometry_crs='EPSG:4326', |
| 137 | + grid_crs=source_crs, |
| 138 | + grid_scale=source_scale |
| 139 | +) |
| 140 | + |
| 141 | +# 4. Open the dataset with the final, combined parameters |
| 142 | +ds = xr.open_dataset(ic, engine='ee', **final_grid_params) |
70 | 143 | ``` |
71 | 144 |
|
72 | | -Open an ImageCollection with a specific EE projection or geometry: |
| 145 | +#### Manual Override |
| 146 | + |
| 147 | +For use cases where you know the exact grid parameters, you can provide them directly. |
73 | 148 |
|
74 | 149 | ```python |
75 | | -ic = ee.ImageCollection('ECMWF/ERA5_LAND/HOURLY').filterDate( |
76 | | - '1992-10-05', '1993-03-31') |
77 | | -leg1 = ee.Geometry.Rectangle(113.33, -43.63, 153.56, -10.66) |
78 | | -ds = xarray.open_dataset( |
79 | | - ic, |
| 150 | +# Manually define a 512x512 pixel grid with 1-degree pixels in EPSG:4326 |
| 151 | +manual_crs = 'EPSG:4326' |
| 152 | +manual_transform = (0.1, 0, -180.05, 0, -0.1, 90.05) # Values are in degrees |
| 153 | +manual_shape = (512, 512) |
| 154 | + |
| 155 | +ds = xr.open_dataset( |
| 156 | + 'ee://ECMWF/ERA5_LAND/MONTHLY_AGGR', |
80 | 157 | engine='ee', |
81 | | - projection=ic.first().select(0).projection(), |
82 | | - geometry=leg1 |
| 158 | + crs=manual_crs, |
| 159 | + crs_transform=manual_transform, |
| 160 | + shape_2d=manual_shape, |
83 | 161 | ) |
84 | 162 | ``` |
85 | 163 |
|
86 | | -Open multiple ImageCollections into one `xarray.Dataset`, all with the same |
87 | | -projection: |
| 164 | +#### Open a Pre-Processed ImageCollection |
| 165 | + |
| 166 | +A key feature of Xee is its ability to open a computed `ee.ImageCollection`. This allows you to leverage Earth Engine's powerful server-side processing for tasks like filtering, band selection, and calculations before loading the data into Xarray. |
88 | 167 |
|
89 | 168 | ```python |
90 | | -ds = xarray.open_mfdataset( |
91 | | - ['ee://ECMWF/ERA5_LAND/HOURLY', 'ee://NASA/GDDP-CMIP6'], |
92 | | - engine='ee', crs='EPSG:4326', scale=0.25) |
| 169 | +# Define an AOI as a shapely object for the helper function |
| 170 | +sf_aoi_shapely = shapely.geometry.Point(-122.4, 37.7).buffer(0.2) |
| 171 | +# Create an ee.Geometry from the shapely object for server-side filtering |
| 172 | +coords = list(sf_aoi_shapely.exterior.coords) |
| 173 | +sf_aoi_ee = ee.Geometry.Polygon(coords) |
| 174 | + |
| 175 | +# Define a function to calculate NDVI and add it as a band |
| 176 | +def add_ndvi(image): |
| 177 | + # Landsat 9 SR bands: NIR = B5, Red = B4 |
| 178 | + ndvi = image.normalizedDifference(['SR_B5', 'SR_B4']).rename('NDVI') |
| 179 | + return image.addBands(ndvi) |
| 180 | + |
| 181 | +# Build the pre-processed collection |
| 182 | +processed_collection = (ee.ImageCollection('LANDSAT/LC09/C02/T1_L2') |
| 183 | + .filterDate('2024-06-01', '2024-09-01') |
| 184 | + .filterBounds(sf_aoi_ee) |
| 185 | + .map(add_ndvi) |
| 186 | + .select(['NDVI'])) |
| 187 | + |
| 188 | +# Define the output grid using a helper |
| 189 | +grid_params = helpers.fit_geometry( |
| 190 | + geometry=sf_aoi_shapely, |
| 191 | + grid_crs='EPSG:32610', # Target CRS in meters (UTM Zone 10N) |
| 192 | + grid_scale=(30, -30) # Use Landsat's 30m resolution |
| 193 | +) |
| 194 | + |
| 195 | +# Open the fully processed collection |
| 196 | +ds = xr.open_dataset(processed_collection, engine='ee', **grid_params) |
93 | 197 | ``` |
94 | 198 |
|
95 | | -Open a single Image by passing it to an ImageCollection: |
| 199 | +#### Open a single Image |
| 200 | + |
| 201 | +The `helpers` work the same way for a single `ee.Image`. |
96 | 202 |
|
97 | 203 | ```python |
98 | | -i = ee.ImageCollection(ee.Image('LANDSAT/LC08/C02/T1_TOA/LC08_044034_20140318')) |
99 | | -ds = xarray.open_dataset(i, engine='ee') |
| 204 | +img = ee.Image('ECMWF/ERA5_LAND/MONTHLY_AGGR/202501') |
| 205 | +grid_params = helpers.extract_grid_params(img) |
| 206 | +ds = xr.open_dataset(img, engine='ee', **grid_params) |
| 207 | +``` |
| 208 | + |
| 209 | +#### Visualize a Single Time Slice |
| 210 | + |
| 211 | +Once you have your `xarray.Dataset`, you can visualize a single time slice of a variable to verify the results. This requires the `matplotlib` library, which is an optional dependency. |
| 212 | + |
| 213 | +If you don't have it installed, you can add it with pip: |
| 214 | + |
| 215 | +```shell |
| 216 | +pip install matplotlib |
100 | 217 | ``` |
101 | 218 |
|
102 | | -Open any Earth Engine ImageCollection to match an existing transform: |
| 219 | +Xarray's plotting functions expect dimensions in `(y, x)` order for 2D plots. Since the data is in `(x, y)` order, we use `.transpose()` to swap the axes for correct visualization. |
103 | 220 |
|
104 | 221 | ```python |
105 | | -raster = rioxarray.open_rasterio(...) # assume crs + transform is set |
106 | | -ds = xr.open_dataset( |
107 | | - 'ee://ECMWF/ERA5_LAND/HOURLY', |
108 | | - engine='ee', |
109 | | - geometry=tuple(raster.rio.bounds()), # must be in EPSG:4326 |
110 | | - projection=ee.Projection( |
111 | | - crs=str(raster.rio.crs), transform=raster.rio.transform()[:6] |
112 | | - ), |
| 222 | + |
| 223 | +# First, open a dataset using one of the methods above |
| 224 | +aoi = shapely.geometry.box(113.33, -43.63, 153.56, -10.66) # Australia |
| 225 | +grid_params = helpers.fit_geometry( |
| 226 | + geometry=aoi, |
| 227 | + grid_crs='EPSG:4326', |
| 228 | + grid_shape=(256, 256) |
113 | 229 | ) |
| 230 | +ds = xr.open_dataset('ECMWF/ERA5_LAND/MONTHLY_AGGR', engine='ee', **grid_params) |
| 231 | + |
| 232 | +# Select the 2m air temperature for the first time step |
| 233 | +temp_slice = ds['temperature_2m'].isel(time=0) |
| 234 | + |
| 235 | +# Transpose from (x, y) to (y, x) for correct plotting orientation and plot |
| 236 | +temp_slice.transpose('y', 'x').plot() |
114 | 237 | ``` |
115 | 238 |
|
116 | 239 | See [examples](https://github.com/google/Xee/tree/main/examples) or |
|
0 commit comments