Skip to content

Commit 0f655af

Browse files
author
Phil Varner
committed
initial commit
0 parents  commit 0f655af

File tree

2 files changed

+231
-0
lines changed

2 files changed

+231
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
.ipynb_checkpoints/
2+
es.ipynb

README.md

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# Aggregation Extension
2+
3+
The purpose of the Aggregation Extension is to provide an endpoint similar to the Search endpoint (`/search`), but which will provide aggregated information on matching Items rather than the Items themselves. This is highly influenced by the Elasticsearch aggregation endpoint, but with a more regular structure for responses.
4+
5+
## STAC Endpoints
6+
7+
| Endpoint | Returns | Description |
8+
| ------------ | -------------------------------------------------------------- | ----------- |
9+
| `/aggregate` | AggregationCollection | Retrieves an aggregation of the group of Items matching the provided predicates |
10+
11+
The `/aggregate` endpoint behaves similarly to the `/search` endpoint, but instead of returning an ItemCollection of Items, it instead returns aggregated information over the same matching Items in the form of an **AggregationCollection** of **Aggregation** entities.
12+
13+
If the `/aggregate` endpoint is implemented, it is **required** to add a link with the `rel` type set to `aggregate` to the `links` array in root entity (`/`) that refers to the aggregate endpoint in the `href` property.
14+
15+
`/aggregatables` link and endpoint?
16+
17+
query and filter extensions can be added with #query and #filter appended to cc uri
18+
19+
## Filter Parameters and Fields
20+
21+
The filters for `/aggregate` are the same as those for `/search` that are semantically meaningful (e.g., limit has no meaning when doing aggregations). These filters are passed as query string parameters or JSON
22+
entity fields. For filters that represent a set of values, query parameters should use comma-separated
23+
string values and JSON entity attributes should use JSON Arrays.
24+
25+
| Parameter | Type | Description |
26+
| ----------- | ---------------- | ----------- |
27+
| datetime | string | Single date+time, or a range ('/' seperator), formatted to [RFC 3339, section 5.6](https://tools.ietf.org/html/rfc3339#section-5.6). Use double dots `..` for open date ranges. |
28+
| bbox | \[number] | Requested bounding box. Represented using either 2D or 3D geometries. The length of the array must be 2*n where n is the number of dimensions. The array contains all axes of the southwesterly most extent followed by all axes of the northeasterly most extent specified in Longitude/Latitude or Longitude/Latitude/Elevation based on [WGS 84](http://www.opengis.net/def/crs/OGC/1.3/CRS84). When using 3D geometries, the elevation of the southwesterly most extent is the minimum elevation in meters and the elevation of the northeasterly most extent is the maximum. |
29+
| intersects | GeoJSON Geometry | Searches items by performing intersection between their geometry and provided GeoJSON geometry. All GeoJSON geometry types must be supported. |
30+
| ids | \[string] | Array of Item ids to return. All other filter parameters that further restrict the number of search results (except `next` and `limit`) are ignored |
31+
| collections | \[string] | Array of Collection IDs to include in the search for items. Only Items in one of the provided Collections will be searched |
32+
| aggregations | \[string] | A list of aggregations to compute and return |
33+
34+
Only one of either **intersects** or **bbox** should be specified. If both are specified, a 400 Bad Request response should be returned.
35+
36+
**aggregations**: There are no named aggregations that must be implemented. All aggregations which are available should be advertised in the root `rel="aggregate"` link.
37+
38+
This is a list of recommended aggregations to implement:
39+
* count (Single Value of type integer)
40+
* collection (Term Count)
41+
* cloud_cover (Discrete Range)
42+
* datetime_min (Single Value of datetime)
43+
* datetime_max (Single Value of datetime)
44+
* datetime_frequency (Datetime Range, automatic interval detection) -- detect a reasonable interval based on the datetime range and distribution of data. Implementation specific.
45+
46+
maybe these?
47+
48+
* datetime_yearly (Datetime Range, interval=year)
49+
* datetime_quarterly (Datetime Range, interval=quarter)
50+
* datetime_monthly (Datetime Range, interval=month)
51+
* datetime_weekly (Datetime Range, interval=week)
52+
* datetime_daily (Datetime Range, interval=day)
53+
* datetime_hourly (Datetime Range, interval=hour)
54+
* datetime_minutes (Datetime Range, interval=minute)
55+
* datetime_seconds (Datetime Range, interval=second)
56+
57+
## AggregationCollection fields
58+
59+
This object describes a STAC AggregationCollection, which is the analog of an ItemCollection for the `/aggregate` operation.
60+
61+
| Field Name | Type | Description |
62+
| --------------- | -------------- | ----------- |
63+
| type | string | **REQUIRED** Always "AggregationCollection". |
64+
| aggregations | \[Aggregation] | **REQUIRED** A possibly-empty array of Aggregations. |
65+
66+
## Aggregation fields
67+
68+
| Field Name | Type | Description |
69+
| --------------- | -------------- | ----------- |
70+
| name | string | **REQUIRED** The unique indentifier of the aggregation. |
71+
| data_type | string | **REQUIRED** The data type of the aggregation |
72+
| buckets | \[Bucket] | If the aggregation bucketizes Items, they are defined here. |
73+
| overflow | integer | The count of Items that were not categorized into any of the buckets defined by the `buckets` field |
74+
| value | string or number or datetime | For a Single Value aggregation, a JSON-type representation of the result value. |
75+
76+
One of either **buckets** or **value** is required.
77+
78+
**name** An identifier for the aggregation result. Should be identical to the value passed to the `aggregations` query parameter.
79+
80+
**data_type** numeric, string, datetime, interval_year, interval_month, interval_week
81+
interval_day, interval_hour, interval_minute, interval_second
82+
83+
**buckets** If the aggregation is a Term Count, Datetime Range, or Discrete Range, these are the "buckets" into which each matching Item is categorized.
84+
85+
**overflow** Some implemenation data stores may have limitations on the aggregation queries that can be performed on them. For example, Elasticsearch limits the number of buckets for a query to 10,000 for performance reasons. Overflow indicates that there were Items matched by the query that are not accounted for in the count of any of the response buckets.
86+
87+
**value** For Single Value aggregations, this is a representation of the result value as the equivalent JSON type. If the type of the value being aggregated over is a datetime, this is an RFC 3339 datetime, e.g., "2020-08-12T19:06:09Z".
88+
89+
## Bucket fields
90+
91+
| Field Name | Type | Aggregation Types | Description |
92+
| --------------- | -------------- | ----------------- | ----------- |
93+
| key | string | all | |
94+
| data_type | string | all | |
95+
| frequency | integer | all | |
96+
| from | numeric | all | |
97+
| to | numeric | all | |
98+
99+
## Aggregation Types
100+
101+
### Single Value Aggregation
102+
103+
effectively a single Term Count Bucket lifted up one level
104+
105+
**todo** (diff for String, Numeric, and Datetime)
106+
107+
Example:
108+
{
109+
"type": "AggregationCollection",
110+
"aggregations": [
111+
{
112+
"key": "datetime_min",
113+
"value": "2000-02-16T00:00:00.000Z",
114+
"value_as_type": 1.506592E+11
115+
}
116+
]
117+
}
118+
119+
### Term Count Aggregation
120+
121+
- enumeration count multi bucket one per unique value
122+
123+
Example:
124+
{
125+
"type": "AggregationCollection",
126+
"aggregations": [
127+
{
128+
"key": "collections",
129+
"buckets": [
130+
{
131+
"key": "sentinel2_l1c",
132+
"value": "12649072",
133+
"value_as_type": 12649072
134+
},
135+
{
136+
"key": "landsat8_l1tp",
137+
"value" : "1071997",
138+
"value_as_type": 1071997
139+
}
140+
],
141+
"overflow": 23414
142+
}
143+
]
144+
}
145+
146+
### Discrete Range Aggregation
147+
148+
Fields:
149+
* key (string)
150+
* key_as_type ()
151+
* from (optional, missing indicates an open interval) inclusive
152+
* to (optional, missing indicates an open interval) exclusive
153+
* value (integer)
154+
155+
Example:
156+
{
157+
"type": "AggregationCollection",
158+
"aggregations": [
159+
{
160+
"key": "cloud_cover",
161+
"buckets": [
162+
{
163+
"key": "*-5.0",
164+
"to": 5,
165+
"value" : "8644819",
166+
"value_as_type" : 8644819
167+
},
168+
{
169+
"key": "5.0-10.0",
170+
"from": 5,
171+
"to": 10,
172+
"value" : "5644819",
173+
"value_as_type" : 5644819
174+
},
175+
{
176+
"key": "10.0-*",
177+
"from": 10,
178+
"value" : "7644819",
179+
"value_as_type" : 7644819
180+
}
181+
]
182+
}
183+
]
184+
}
185+
186+
### Datetime Range Aggregation
187+
188+
Fields:
189+
datetimes are RFC 3339 string values
190+
191+
* key (string)
192+
* key_as_type (datetime in milliseconds?)
193+
* value (integer) (ES: doc_count)
194+
195+
Example:
196+
{
197+
"type": "AggregationCollection",
198+
"aggregations": [
199+
{
200+
"key": "datetime_yearly",
201+
"buckets": [
202+
{
203+
"key": "2000-01-01T00:00:00.000Z",
204+
"key_as_type": 946684800000,
205+
"to": 5,
206+
"value" : "8644819",
207+
"value_as_type" : 8644819
208+
},
209+
{
210+
"key": "2001-01-01T00:00:00.000Z,
211+
"key_as_type": 978307200000,
212+
"from": 5,
213+
"to": 10,
214+
"value" : "5644819",
215+
"value_as_type" : 5644819
216+
},
217+
{
218+
"key": "2002-01-01T00:00:00.000Z,
219+
"key_as_type": 1009843200000,
220+
"from": 10,
221+
"value" : "7644819",
222+
"value_as_type" : 7644819
223+
}
224+
],
225+
"interval": "year",
226+
"overflow": 98373
227+
}
228+
]
229+
}

0 commit comments

Comments
 (0)