|
| 1 | +# Aggregation Extension |
| 2 | + |
| 3 | +The purpose of the Aggregation Extension is to provide an endpoint similar to the Search endpoint (`/search`), but which will provide aggregated information on matching Items rather than the Items themselves. This is highly influenced by the Elasticsearch aggregation endpoint, but with a more regular structure for responses. |
| 4 | + |
| 5 | +## STAC Endpoints |
| 6 | + |
| 7 | +| Endpoint | Returns | Description | |
| 8 | +| ------------ | -------------------------------------------------------------- | ----------- | |
| 9 | +| `/aggregate` | AggregationCollection | Retrieves an aggregation of the group of Items matching the provided predicates | |
| 10 | + |
| 11 | +The `/aggregate` endpoint behaves similarly to the `/search` endpoint, but instead of returning an ItemCollection of Items, it instead returns aggregated information over the same matching Items in the form of an **AggregationCollection** of **Aggregation** entities. |
| 12 | + |
| 13 | +If the `/aggregate` endpoint is implemented, it is **required** to add a link with the `rel` type set to `aggregate` to the `links` array in root entity (`/`) that refers to the aggregate endpoint in the `href` property. |
| 14 | + |
| 15 | +`/aggregatables` link and endpoint? |
| 16 | + |
| 17 | +query and filter extensions can be added with #query and #filter appended to cc uri |
| 18 | + |
| 19 | +## Filter Parameters and Fields |
| 20 | + |
| 21 | +The filters for `/aggregate` are the same as those for `/search` that are semantically meaningful (e.g., limit has no meaning when doing aggregations). These filters are passed as query string parameters or JSON |
| 22 | +entity fields. For filters that represent a set of values, query parameters should use comma-separated |
| 23 | +string values and JSON entity attributes should use JSON Arrays. |
| 24 | + |
| 25 | +| Parameter | Type | Description | |
| 26 | +| ----------- | ---------------- | ----------- | |
| 27 | +| datetime | string | Single date+time, or a range ('/' seperator), formatted to [RFC 3339, section 5.6](https://tools.ietf.org/html/rfc3339#section-5.6). Use double dots `..` for open date ranges. | |
| 28 | +| bbox | \[number] | Requested bounding box. Represented using either 2D or 3D geometries. The length of the array must be 2*n where n is the number of dimensions. The array contains all axes of the southwesterly most extent followed by all axes of the northeasterly most extent specified in Longitude/Latitude or Longitude/Latitude/Elevation based on [WGS 84](http://www.opengis.net/def/crs/OGC/1.3/CRS84). When using 3D geometries, the elevation of the southwesterly most extent is the minimum elevation in meters and the elevation of the northeasterly most extent is the maximum. | |
| 29 | +| intersects | GeoJSON Geometry | Searches items by performing intersection between their geometry and provided GeoJSON geometry. All GeoJSON geometry types must be supported. | |
| 30 | +| ids | \[string] | Array of Item ids to return. All other filter parameters that further restrict the number of search results (except `next` and `limit`) are ignored | |
| 31 | +| collections | \[string] | Array of Collection IDs to include in the search for items. Only Items in one of the provided Collections will be searched | |
| 32 | +| aggregations | \[string] | A list of aggregations to compute and return | |
| 33 | + |
| 34 | +Only one of either **intersects** or **bbox** should be specified. If both are specified, a 400 Bad Request response should be returned. |
| 35 | + |
| 36 | +**aggregations**: There are no named aggregations that must be implemented. All aggregations which are available should be advertised in the root `rel="aggregate"` link. |
| 37 | + |
| 38 | +This is a list of recommended aggregations to implement: |
| 39 | +* count (Single Value of type integer) |
| 40 | +* collection (Term Count) |
| 41 | +* cloud_cover (Discrete Range) |
| 42 | +* datetime_min (Single Value of datetime) |
| 43 | +* datetime_max (Single Value of datetime) |
| 44 | +* datetime_frequency (Datetime Range, automatic interval detection) -- detect a reasonable interval based on the datetime range and distribution of data. Implementation specific. |
| 45 | + |
| 46 | +maybe these? |
| 47 | + |
| 48 | +* datetime_yearly (Datetime Range, interval=year) |
| 49 | +* datetime_quarterly (Datetime Range, interval=quarter) |
| 50 | +* datetime_monthly (Datetime Range, interval=month) |
| 51 | +* datetime_weekly (Datetime Range, interval=week) |
| 52 | +* datetime_daily (Datetime Range, interval=day) |
| 53 | +* datetime_hourly (Datetime Range, interval=hour) |
| 54 | +* datetime_minutes (Datetime Range, interval=minute) |
| 55 | +* datetime_seconds (Datetime Range, interval=second) |
| 56 | + |
| 57 | +## AggregationCollection fields |
| 58 | + |
| 59 | +This object describes a STAC AggregationCollection, which is the analog of an ItemCollection for the `/aggregate` operation. |
| 60 | + |
| 61 | +| Field Name | Type | Description | |
| 62 | +| --------------- | -------------- | ----------- | |
| 63 | +| type | string | **REQUIRED** Always "AggregationCollection". | |
| 64 | +| aggregations | \[Aggregation] | **REQUIRED** A possibly-empty array of Aggregations. | |
| 65 | + |
| 66 | +## Aggregation fields |
| 67 | + |
| 68 | +| Field Name | Type | Description | |
| 69 | +| --------------- | -------------- | ----------- | |
| 70 | +| name | string | **REQUIRED** The unique indentifier of the aggregation. | |
| 71 | +| data_type | string | **REQUIRED** The data type of the aggregation | |
| 72 | +| buckets | \[Bucket] | If the aggregation bucketizes Items, they are defined here. | |
| 73 | +| overflow | integer | The count of Items that were not categorized into any of the buckets defined by the `buckets` field | |
| 74 | +| value | string or number or datetime | For a Single Value aggregation, a JSON-type representation of the result value. | |
| 75 | + |
| 76 | +One of either **buckets** or **value** is required. |
| 77 | + |
| 78 | +**name** An identifier for the aggregation result. Should be identical to the value passed to the `aggregations` query parameter. |
| 79 | + |
| 80 | +**data_type** numeric, string, datetime, interval_year, interval_month, interval_week |
| 81 | +interval_day, interval_hour, interval_minute, interval_second |
| 82 | + |
| 83 | +**buckets** If the aggregation is a Term Count, Datetime Range, or Discrete Range, these are the "buckets" into which each matching Item is categorized. |
| 84 | + |
| 85 | +**overflow** Some implemenation data stores may have limitations on the aggregation queries that can be performed on them. For example, Elasticsearch limits the number of buckets for a query to 10,000 for performance reasons. Overflow indicates that there were Items matched by the query that are not accounted for in the count of any of the response buckets. |
| 86 | + |
| 87 | +**value** For Single Value aggregations, this is a representation of the result value as the equivalent JSON type. If the type of the value being aggregated over is a datetime, this is an RFC 3339 datetime, e.g., "2020-08-12T19:06:09Z". |
| 88 | + |
| 89 | +## Bucket fields |
| 90 | + |
| 91 | +| Field Name | Type | Aggregation Types | Description | |
| 92 | +| --------------- | -------------- | ----------------- | ----------- | |
| 93 | +| key | string | all | | |
| 94 | +| data_type | string | all | | |
| 95 | +| frequency | integer | all | | |
| 96 | +| from | numeric | all | | |
| 97 | +| to | numeric | all | | |
| 98 | + |
| 99 | +## Aggregation Types |
| 100 | + |
| 101 | +### Single Value Aggregation |
| 102 | + |
| 103 | +effectively a single Term Count Bucket lifted up one level |
| 104 | + |
| 105 | +**todo** (diff for String, Numeric, and Datetime) |
| 106 | + |
| 107 | +Example: |
| 108 | + { |
| 109 | + "type": "AggregationCollection", |
| 110 | + "aggregations": [ |
| 111 | + { |
| 112 | + "key": "datetime_min", |
| 113 | + "value": "2000-02-16T00:00:00.000Z", |
| 114 | + "value_as_type": 1.506592E+11 |
| 115 | + } |
| 116 | + ] |
| 117 | + } |
| 118 | + |
| 119 | +### Term Count Aggregation |
| 120 | + |
| 121 | +- enumeration count multi bucket one per unique value |
| 122 | + |
| 123 | +Example: |
| 124 | + { |
| 125 | + "type": "AggregationCollection", |
| 126 | + "aggregations": [ |
| 127 | + { |
| 128 | + "key": "collections", |
| 129 | + "buckets": [ |
| 130 | + { |
| 131 | + "key": "sentinel2_l1c", |
| 132 | + "value": "12649072", |
| 133 | + "value_as_type": 12649072 |
| 134 | + }, |
| 135 | + { |
| 136 | + "key": "landsat8_l1tp", |
| 137 | + "value" : "1071997", |
| 138 | + "value_as_type": 1071997 |
| 139 | + } |
| 140 | + ], |
| 141 | + "overflow": 23414 |
| 142 | + } |
| 143 | + ] |
| 144 | + } |
| 145 | + |
| 146 | +### Discrete Range Aggregation |
| 147 | + |
| 148 | +Fields: |
| 149 | +* key (string) |
| 150 | +* key_as_type () |
| 151 | +* from (optional, missing indicates an open interval) inclusive |
| 152 | +* to (optional, missing indicates an open interval) exclusive |
| 153 | +* value (integer) |
| 154 | + |
| 155 | +Example: |
| 156 | + { |
| 157 | + "type": "AggregationCollection", |
| 158 | + "aggregations": [ |
| 159 | + { |
| 160 | + "key": "cloud_cover", |
| 161 | + "buckets": [ |
| 162 | + { |
| 163 | + "key": "*-5.0", |
| 164 | + "to": 5, |
| 165 | + "value" : "8644819", |
| 166 | + "value_as_type" : 8644819 |
| 167 | + }, |
| 168 | + { |
| 169 | + "key": "5.0-10.0", |
| 170 | + "from": 5, |
| 171 | + "to": 10, |
| 172 | + "value" : "5644819", |
| 173 | + "value_as_type" : 5644819 |
| 174 | + }, |
| 175 | + { |
| 176 | + "key": "10.0-*", |
| 177 | + "from": 10, |
| 178 | + "value" : "7644819", |
| 179 | + "value_as_type" : 7644819 |
| 180 | + } |
| 181 | + ] |
| 182 | + } |
| 183 | + ] |
| 184 | + } |
| 185 | + |
| 186 | +### Datetime Range Aggregation |
| 187 | + |
| 188 | +Fields: |
| 189 | +datetimes are RFC 3339 string values |
| 190 | + |
| 191 | +* key (string) |
| 192 | +* key_as_type (datetime in milliseconds?) |
| 193 | +* value (integer) (ES: doc_count) |
| 194 | + |
| 195 | +Example: |
| 196 | + { |
| 197 | + "type": "AggregationCollection", |
| 198 | + "aggregations": [ |
| 199 | + { |
| 200 | + "key": "datetime_yearly", |
| 201 | + "buckets": [ |
| 202 | + { |
| 203 | + "key": "2000-01-01T00:00:00.000Z", |
| 204 | + "key_as_type": 946684800000, |
| 205 | + "to": 5, |
| 206 | + "value" : "8644819", |
| 207 | + "value_as_type" : 8644819 |
| 208 | + }, |
| 209 | + { |
| 210 | + "key": "2001-01-01T00:00:00.000Z, |
| 211 | + "key_as_type": 978307200000, |
| 212 | + "from": 5, |
| 213 | + "to": 10, |
| 214 | + "value" : "5644819", |
| 215 | + "value_as_type" : 5644819 |
| 216 | + }, |
| 217 | + { |
| 218 | + "key": "2002-01-01T00:00:00.000Z, |
| 219 | + "key_as_type": 1009843200000, |
| 220 | + "from": 10, |
| 221 | + "value" : "7644819", |
| 222 | + "value_as_type" : 7644819 |
| 223 | + } |
| 224 | + ], |
| 225 | + "interval": "year", |
| 226 | + "overflow": 98373 |
| 227 | + } |
| 228 | + ] |
| 229 | + } |
0 commit comments