|
| 1 | +# Data filtering via CQL2 |
| 2 | + |
| 3 | +The system supports generating CQL2 filters based on request context to provide row-level content filtering. These CQL2 filters are then set on outgoing requests prior to the upstream API. |
| 4 | + |
| 5 | +> [!IMPORTANT] |
| 6 | +> The upstream STAC API must support the [STAC API Filter Extension](https://github.com/stac-api-extensions/filter/blob/main/README.md), including the [Features Filter](http://www.opengis.net/spec/ogcapi-features-3/1.0/conf/features-filter) conformance class on to the Features resource (`/collections/{cid}/items`)[^37]. |
| 7 | +
|
| 8 | +## Filters |
| 9 | + |
| 10 | +### `ITEMS_FILTER` |
| 11 | + |
| 12 | +The [`ITEMS_FILTER`](../configuration.md#collections_filter_cls) is applied to the following operations. |
| 13 | + |
| 14 | +> [!WARNING] |
| 15 | +> Operations without a check mark are not yet supported. We intend to support these operations within the future. |
| 16 | +
|
| 17 | +- [x] `GET /search` |
| 18 | + - **Action:** Read Item |
| 19 | + - **Strategy:** Append query params with generated CQL2 query. |
| 20 | +- [x] `POST /search` |
| 21 | + - **Action:** Read Item |
| 22 | + - **Strategy:** Append body with generated CQL2 query. |
| 23 | +- [x] `GET /collections/{collection_id}/items` |
| 24 | + - **Action:** Read Item |
| 25 | + - **Strategy:** Append query params with generated CQL2 query. |
| 26 | +- [x] `GET /collections/{collection_id}/items/{item_id}` |
| 27 | + - **Action:** Read Item |
| 28 | + - **Strategy:** Validate response against CQL2 query. |
| 29 | +- [ ] `POST /collections/{collection_id}/items`[^21] |
| 30 | + - **Action:** Create Item |
| 31 | + - **Strategy:** Validate body with generated CQL2 query. |
| 32 | +- [ ] `PUT /collections/{collection_id}/items/{item_id}`[^21] |
| 33 | + - **Action:** Update Item |
| 34 | + - **Strategy:** Fetch Item and validate CQL2 query; merge Item with body and validate with generated CQL2 query. |
| 35 | +- [ ] `DELETE /collections/{collection_id}/items/{item_id}`[^21] |
| 36 | + - **Action:** Delete Item |
| 37 | + - **Strategy:** Fetch Item and validate with CQL2 query. |
| 38 | +- [ ] `POST /collections/{collection_id}/bulk_items`[^21] |
| 39 | + - **Action:** Create Items |
| 40 | + - **Strategy:** Validate items in body with generated CQL2 query. |
| 41 | + |
| 42 | +### `COLLECTIONS_FILTER` |
| 43 | + |
| 44 | +The [`COLLECTIONS_FILTER`](../configuration#collections_filter_cls) applies to the following operations. |
| 45 | + |
| 46 | +> [!WARNING] |
| 47 | +> Operations without a check mark are not yet supported. We intend to support these operations within the future. |
| 48 | +
|
| 49 | +- [x] `GET /collections` |
| 50 | + - **Action:** Read Collection |
| 51 | + - **Strategy:** Append query params with generated CQL2 query. |
| 52 | +- [x] `GET /collections/{collection_id}` |
| 53 | + - **Action:** Read Collection |
| 54 | + - **Strategy:** Validate response against CQL2 query. |
| 55 | +- [ ] `POST /collections/`[^22] |
| 56 | + - **Action:** Create Collection |
| 57 | + - **Strategy:** Validate body with generated CQL2 query. |
| 58 | +- [ ] `PUT /collections/{collection_id}`[^22] |
| 59 | + - **Action:** Update Collection |
| 60 | + - **Strategy:** Fetch Collection and validate CQL2 query; merge Item with body and validate with generated CQL2 query. |
| 61 | +- [ ] `DELETE /collections/{collection_id}`[^22] |
| 62 | + - **Action:** Delete Collection |
| 63 | + - **Strategy:** Fetch Collection and validate with CQL2 query. |
| 64 | + |
| 65 | +## Example Request Flow for multi-record endpoints |
| 66 | + |
| 67 | +```mermaid |
| 68 | +sequenceDiagram |
| 69 | + Client->>Proxy: GET /collections |
| 70 | + Note over Proxy: EnforceAuth checks credentials |
| 71 | + Note over Proxy: BuildCql2Filter creates filter |
| 72 | + Note over Proxy: ApplyCql2Filter applies filter to request |
| 73 | + Proxy->>STAC API: GET /collection?filter=(collection=landsat) |
| 74 | + STAC API->>Client: Response |
| 75 | +``` |
| 76 | + |
| 77 | +## Example Request Flow for single-record endpoints |
| 78 | + |
| 79 | +The Filter Extension does not apply to fetching individual records. As such, we must validate the record _after_ it is returned from the upstream API but _before_ it is returned to the user: |
| 80 | + |
| 81 | +```mermaid |
| 82 | +sequenceDiagram |
| 83 | + Client->>Proxy: GET /collections/abc123 |
| 84 | + Note over Proxy: EnforceAuth checks credentials |
| 85 | + Note over Proxy: BuildCql2Filter creates filter |
| 86 | + Proxy->>STAC API: GET /collection/abc123 |
| 87 | + Note over Proxy: ApplyCql2Filter validates the response |
| 88 | + STAC API->>Client: Response |
| 89 | +``` |
| 90 | + |
| 91 | +## Authoring Filter Generators |
| 92 | + |
| 93 | +The `ITEMS_FILTER_CLS` configuration option can be used to specify a class that will be used to generate a CQL2 filter for the request. The class must define a `__call__` method that accepts a single argument: a dictionary containing the request context; and returns a valid `cql2-text` expression (as a `str`) or `cql2-json` expression (as a `dict`). |
| 94 | + |
| 95 | +> [!TIP] |
| 96 | +> An example integration can be found in [`examples/custom-integration`](https://github.com/developmentseed/stac-auth-proxy/blob/main/examples/custom-integration). |
| 97 | +
|
| 98 | +### Basic Filter Generator |
| 99 | + |
| 100 | +```py |
| 101 | +import dataclasses |
| 102 | +from typing import Any |
| 103 | + |
| 104 | +from cql2 import Expr |
| 105 | + |
| 106 | + |
| 107 | +@dataclasses.dataclass |
| 108 | +class ExampleFilter: |
| 109 | + async def __call__(self, context: dict[str, Any]) -> str: |
| 110 | + return "true" |
| 111 | +``` |
| 112 | + |
| 113 | +> [!TIP] |
| 114 | +> Despite being referred to as a _class_, a filter generator could be written as a function. |
| 115 | +> |
| 116 | +> <details> |
| 117 | +> |
| 118 | +> <summary>Example</summary> |
| 119 | +> |
| 120 | +> ```py |
| 121 | +> from typing import Any |
| 122 | +> |
| 123 | +> from cql2 import Expr |
| 124 | +> |
| 125 | +> |
| 126 | +> def example_filter(): |
| 127 | +> async def example_filter(context: dict[str, Any]) -> str | dict[str, Any]: |
| 128 | +> return Expr("true") |
| 129 | +> return example_filter |
| 130 | +> ``` |
| 131 | +> |
| 132 | +> </details> |
| 133 | +
|
| 134 | +### Complex Filter Generator |
| 135 | +
|
| 136 | +An example of a more complex filter generator where the filter is generated based on the response of an external API: |
| 137 | +
|
| 138 | +```py |
| 139 | +import dataclasses |
| 140 | +from typing import Any, Literal, Optional |
| 141 | +
|
| 142 | +from httpx import AsyncClient |
| 143 | +from stac_auth_proxy.utils.cache import MemoryCache |
| 144 | +
|
| 145 | +
|
| 146 | +@dataclasses.dataclass |
| 147 | +class ApprovedCollectionsFilter: |
| 148 | + api_url: str |
| 149 | + kind: Literal["item", "collection"] = "item" |
| 150 | + client: AsyncClient = dataclasses.field(init=False) |
| 151 | + cache: MemoryCache = dataclasses.field(init=False) |
| 152 | +
|
| 153 | + def __post_init__(self): |
| 154 | + # We keep the client in the class instance to avoid creating a new client for |
| 155 | + # each request, taking advantage of the client's connection pooling. |
| 156 | + self.client = AsyncClient(base_url=self.api_url) |
| 157 | + self.cache = MemoryCache(ttl=30) |
| 158 | +
|
| 159 | + async def __call__(self, context: dict[str, Any]) -> dict[str, Any]: |
| 160 | + token = context["req"]["headers"].get("authorization") |
| 161 | +
|
| 162 | + try: |
| 163 | + # Check cache for a previously generated filter |
| 164 | + approved_collections = self.cache[token] |
| 165 | + except KeyError: |
| 166 | + # Lookup approved collections from an external API |
| 167 | + approved_collections = await self.lookup(token) |
| 168 | + self.cache[token] = approved_collections |
| 169 | +
|
| 170 | + # Build CQL2 filter |
| 171 | + return { |
| 172 | + "op": "a_containedby", |
| 173 | + "args": [ |
| 174 | + {"property": "collection" if self.kind == "item" else "id"}, |
| 175 | + approved_collections |
| 176 | + ], |
| 177 | + } |
| 178 | +
|
| 179 | + async def lookup(self, token: Optional[str]) -> list[str]: |
| 180 | + # Lookup approved collections from an external API |
| 181 | + headers = {"Authorization": f"Bearer {token}"} if token else {} |
| 182 | + response = await self.client.get( |
| 183 | + f"/get-approved-collections", |
| 184 | + headers=headers, |
| 185 | + ) |
| 186 | + response.raise_for_status() |
| 187 | + return response.json()["collections"] |
| 188 | +``` |
| 189 | +
|
| 190 | +> [!TIP] |
| 191 | +> Filter generation runs for every relevant request. Consider memoizing external API calls to improve performance. |
| 192 | +
|
| 193 | +[^21]: https://github.com/developmentseed/stac-auth-proxy/issues/21 |
| 194 | +[^22]: https://github.com/developmentseed/stac-auth-proxy/issues/22 |
| 195 | +[^37]: https://github.com/developmentseed/stac-auth-proxy/issues/37 |
0 commit comments