Skip to content

Commit 3afe8b2

Browse files
committed
icg: partial integration of data provider requirements
1 parent d4bb9fe commit 3afe8b2

File tree

2 files changed

+90
-8
lines changed

2 files changed

+90
-8
lines changed

guides/file_formats.qmd

Lines changed: 50 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -33,14 +33,19 @@ challenging to use.
3333
An exception to this are the 'RGB' style products, where three bands are used to represent a single image. In this case,
3434
creating a Cloud Optimised GeoTIFF with three bands is an option.
3535

36-
For associating time information, create one GeoTIFF per timestamp, and one STAC item per timestamp. The GeoTIFF format has
37-
not built-in support for conveying time information, but STAC metadata is supporting this very well.
36+
For associating time information, create one GeoTIFF per timestamp, and one STAC item per timestamp. The GeoTIFF format
37+
has not built-in support for conveying time information, but STAC metadata is supporting this very well.
3838

3939
### Visualisation in APEx Geospatial Explorer
4040

41-
To optimise visualisation in the APEx Geospatial Explorer, additional guidelines have been established. Adhering to these
42-
guidelines will ensure that the data is effectively optimised for visualisation on a map. Please refer to
43-
[this page](../interoperability/geospatial_explorer.qmd#cloud-optimized-geotiff-cog) for more information.
41+
To optimise visualisation in the APEx Geospatial Explorer, it is recommended to use the GoogleMapsCompatible tiling scheme-
42+
typically 256x256 pixel tiles aligned to a global grid. The default Coordinate Reference System (CRS) used in the Geospatial
43+
Explorer is Web Mercator projection (EPSG:3857) and therefore all datasets in this projection will be supported. On the
44+
fly reprojection and / or configuration of a Geospatial Explorer instance to alternative CRS’s is feasible, although we
45+
advise contact the APEx team for specific advice when using alternative projections. The BitsPerSample field must accurately
46+
reflect the data format. Overviews are essential for performance and should be generated using downsampling by factors of
47+
two until the image dimensions are the size of a tile or smaller. These overviews should also be tiled and placed after
48+
the main image data to conform with the COG specification.
4449

4550
## (Geo-)Zarr
4651

@@ -61,6 +66,44 @@ At the time of writing, there are, however these important caveats:
6166

6267
## NetCDF
6368

64-
NetCDF is a self-describing format with some properties similar to Zarr, but less optimised for cloud access. It can be useful
65-
for exchanging data cubes as single files through traditional methods. However, it is less recommended for convenient
69+
NetCDF is a self-describing format with some properties similar to Zarr, but less optimised for cloud access. It can be
70+
useful for exchanging data cubes as single files through traditional methods. However, it is less recommended for convenient
6671
sharing of large datasets, for which either COG or Zarr provide better options.
72+
73+
## Statistical Datasets (FlatGeobuf, GeoJSON)
74+
75+
Statistical datasets can be used to store precomputed statistics for dataset variables based on spatial units, such as
76+
administrative areas. An example is to collect land cover statistics on using boundaries from nomenclature of territorial
77+
units for statistics (NUTS), as shown in the [APEx Geospatial Explorer](https://explorer.apex.esa.int/) (Statistics). The
78+
guidelines in this section are focused on supporting the integration of statistical data for visualisation in the APEx
79+
Geospatial Explorer.
80+
81+
The statistical datasets are expected to be vector layers that are provided in a format that can be parsed to a feature
82+
collection following the GeoJSON [@geojson] specification. Currently tested and supported formats are GeoJSON [@geojson]
83+
and FlatGeobuf [@flatgeobuf]. FlatGeobuf should be used where the statistical data is a large size as this allows for
84+
streaming of the relevant features without having to download the full dataset, increasing performance.
85+
86+
The metadata header of the file should contain the following properties to define which fields on the features in the
87+
dataset should be used for the following purposes.
88+
89+
- identifierKey: The name of the field that stores the unique identifier for each feature.
90+
- nameKey: The name of the field that stores the human-readable name for display.
91+
- levelKey: The name of the field that stores the administrative level number.
92+
- childrenKey: The name of the field that has a comma-separated list of child feature IDs as declared in identifierKey.
93+
Can be the empty string if this is the bottom level.
94+
- attributeKeys: A comma-separated list of field numbers that store the statistical data.
95+
- units: The units as displayed in the UI. This is for UI purposes only and has no effect on the data.
96+
- visualization_hint: A string of histogram, categorised, or continuous used as a hint to the UI to choose a suitable
97+
presentation for the data.
98+
99+
For example, properties in the file metadata that is defined as follows:
100+
101+
- identifierKey: NUTS_ID
102+
- nameKey: NUTS_NAME
103+
- levelKey: LEVL_CODE
104+
- childrenKey: children
105+
- attributeKeys: Trees, Shrubland, Grassland
106+
- visualization_hint: categorised
107+
108+
would use the fields NUTS_ID, NUTS_NAME, … in the data to determine the navigation and display of statistics in the
109+
Geospatial Explorer. For further guidance, please contact the APEx team through the [APEx User Forum](http://forum.apex.esa.int/).

interoperability/datahosting.md

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ fostering wider adoption and enabling advanced use cases in downstream applicati
4444
<tr>
4545
<td>DATA-REQ-03</td>
4646
<td>EO project results should be accompanied with metadata in a STAC [@stac] format, including applicable STAC extensions.</td>
47-
<td>The specific STAC profiles will align with the recommendations provided by the <a href="https://eoresults.esa.int/reg-api/docs#/Implemented%20tranaction%20operations%3A/collection_items_post_request_collections__collectionId__items_post">ESA Project Results Repository (PRR)</a>. More details regarding which profiles to apply will be added as the project progresses.</td>
47+
<td>The specific STAC profiles will align with the recommendations will align with the recommendations provided in the [Metadata Recommendations](#metadata-recommendations) section.</td>
4848
</tr>
4949
</tbody>
5050
</table>
@@ -56,3 +56,42 @@ Table: Interoperability requirements for data providers
5656
For more details regarding the recommended file formats and their usage within APEx, please refer to the
5757
[APEx File Format Recommendations](../guides/file_formats.qmd).
5858
:::
59+
60+
61+
## Metadata Recommendations
62+
63+
### Format Specific Recommendations
64+
65+
When sharing geospatial datasets in cloud-optimised formats, such as Cloud Optimised GeoTIFF (COG), NetCDF, and Zarr, it
66+
is essential to embed as much relevant metadata as possible directly within the files. Although these formats are designed
67+
for efficient cloud access, their interoperability potential is enhanced when the files carry rich, standardised metadata
68+
aligned with their respective specifications. Doing so not only improves data reuse by third-party tools but also enables
69+
more reliable automatic inference of STAC metadata during cataloguing or dataset publication.
70+
71+
APEx recommends that the following details be incorporated into the file metadata:
72+
73+
- The projection system used to present the data within the file
74+
- he Nodata value applied
75+
- The unit of measurement for values represented in the dataset
76+
- A definition of the colour map or legend utilised for the dataset visualisation in case of categorical data.
77+
- Band or variable names and descriptions
78+
79+
For more details and examples on adding this additional metadata to your results, please consult the specific tools
80+
(e.g. gdal, rasterio, …) for generating the results.
81+
82+
### STAC Metadata Recommendations
83+
84+
The STAC specification provides a comprehensive and interoperable framework for describing geospatial datasets. Within
85+
APEx, STAC serves as the foundation to enhance the discoverability, interoperability, and integration of data across a
86+
range of platforms, data catalogues, including the ESA Project Results Repository, and tools such as the APEx Geospatial
87+
Explorer.
88+
89+
To enhance interoperability, data providers are advised to consistently use a recommended set of STAC-related extensions
90+
and best practices. These recommendations come from community input and collaboration with other initiatives, like
91+
EarthCODE and EOEPCA, to ensure consistency across projects and promote the adoption of best practices.
92+
93+
@tbl-metadata offers a summary of the suggested metadata. For further details, please refer to the resources listed below.
94+
95+
- [STAC Best Practices](https://github.com/radiantearth/stac-best-practices/blob/main/README.md)
96+
- [EOEPCA+ Datacube Access Best Practices](https://github.com/EOEPCA/datacube-access/blob/main/best_practices/stac_best_practices.md)
97+
- [ESA PRR Collection Specifications](https://eoresults.esa.int/prr_collection_specifications.html)

0 commit comments

Comments
 (0)