Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/cicd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ jobs:
xpack.security.enabled: false
xpack.security.transport.ssl.enabled: false
ES_JAVA_OPTS: -Xms512m -Xmx1g
action.destructive_requires_name: false
ports:
- 9200:9200

Expand All @@ -44,6 +45,7 @@ jobs:
xpack.security.enabled: false
xpack.security.transport.ssl.enabled: false
ES_JAVA_OPTS: -Xms512m -Xmx1g
action.destructive_requires_name: false
ports:
- 9400:9400

Expand All @@ -60,6 +62,7 @@ jobs:
plugins.security.disabled: true
plugins.security.ssl.http.enabled: true
OPENSEARCH_JAVA_OPTS: -Xms512m -Xmx512m
action.destructive_requires_name: false
ports:
- 9202:9202

Expand Down Expand Up @@ -120,5 +123,6 @@ jobs:
ES_PORT: ${{ matrix.backend == 'elasticsearch7' && '9400' || matrix.backend == 'elasticsearch8' && '9200' || '9202' }}
ES_HOST: 172.17.0.1
ES_USE_SSL: false
DATABASE_REFRESH: true
ES_VERIFY_CERTS: false
BACKEND: ${{ matrix.backend == 'elasticsearch7' && 'elasticsearch' || matrix.backend == 'elasticsearch8' && 'elasticsearch' || 'opensearch' }}
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,32 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]

### Added

- Added comprehensive index management system with dynamic selection and insertion strategies for improved performance and scalability [#405](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/405)
- Added `ENABLE_DATETIME_INDEX_FILTERING` environment variable to enable datetime-based index selection using collection IDs. When enabled, the system creates indexes with UUID-based names and manages them through time-based aliases. Default is `false`. [#405](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/405)
- Added `DATETIME_INDEX_MAX_SIZE_GB` environment variable to set maximum size limit in GB for datetime-based indexes. When an index exceeds this size, a new time-partitioned index will be created. Note: add +20% to target size due to ES/OS compression. Default is `25` GB. Only applies when `ENABLE_DATETIME_INDEX_FILTERING` is enabled. [#405](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/405)
- Added index operations system with unified interface for both Elasticsearch and OpenSearch [#405](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/405):
- `IndexOperations` class with common index creation and management methods
- UUID-based physical index naming: `{prefix}_{collection-id}_{uuid4}`
- Alias management: main collection alias, temporal aliases, and closed index aliases
- Automatic alias updates when indexes reach size limits
- Added datetime-based index selection strategies with caching support [#405](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/405):
- `DatetimeBasedIndexSelector` for temporal filtering with intelligent caching
- `IndexCacheManager` with configurable TTL-based cache expiration (default 1 hour)
- `IndexAliasLoader` for alias management and cache refresh
- `UnfilteredIndexSelector` as fallback for returning all available indexes
- Added index insertion strategies with automatic partitioning [#405](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/405):
- Simple insertion strategy (`SimpleIndexInserter`) for traditional single-index-per-collection approach
- Datetime-based insertion strategy (`DatetimeIndexInserter`) with time-based partitioning
- Automatic index size monitoring and splitting when limits exceeded
- Handling of chronologically early data and bulk operations
- Added index management utilities [#405](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/405):
- `IndexSizeManager` for size monitoring and overflow handling with compression awareness
- `DatetimeIndexManager` for datetime-based index operations and validation
- Factory patterns (`IndexInsertionFactory`, `IndexSelectorFactory`) for strategy creation based on configuration


## [v6.1.0] - 2025-07-24

### Added
Expand Down
15 changes: 10 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ run_os = docker compose \
.PHONY: image-deploy-es
image-deploy-es:
docker build -f dockerfiles/Dockerfile.dev.es -t stac-fastapi-elasticsearch:latest .

.PHONY: image-deploy-os
image-deploy-os:
docker build -f dockerfiles/Dockerfile.dev.os -t stac-fastapi-opensearch:latest .
Expand Down Expand Up @@ -71,14 +71,19 @@ test-opensearch:
-$(run_os) /bin/bash -c 'export && ./scripts/wait-for-it-es.sh opensearch:9202 && cd stac_fastapi/tests/ && pytest'
docker compose down

.PHONY: test
test:
-$(run_es) /bin/bash -c 'export && ./scripts/wait-for-it-es.sh elasticsearch:9200 && cd stac_fastapi/tests/ && pytest --cov=stac_fastapi --cov-report=term-missing'
.PHONY: test-datetime-filtering-es
test-datetime-filtering-es:
-$(run_es) /bin/bash -c 'export ENABLE_DATETIME_INDEX_FILTERING=true && ./scripts/wait-for-it-es.sh elasticsearch:9200 && cd stac_fastapi/tests/ && pytest -s --cov=stac_fastapi --cov-report=term-missing -m datetime_filtering'
docker compose down

-$(run_os) /bin/bash -c 'export && ./scripts/wait-for-it-es.sh opensearch:9202 && cd stac_fastapi/tests/ && pytest --cov=stac_fastapi --cov-report=term-missing'
.PHONY: test-datetime-filtering-os
test-datetime-filtering-os:
-$(run_os) /bin/bash -c 'export ENABLE_DATETIME_INDEX_FILTERING=true && ./scripts/wait-for-it-es.sh opensearch:9202 && cd stac_fastapi/tests/ && pytest -s --cov=stac_fastapi --cov-report=term-missing -m datetime_filtering'
docker compose down

.PHONY: test
test: test-elasticsearch test-datetime-filtering-es test-opensearch test-datetime-filtering-os

.PHONY: run-database-es
run-database-es:
docker compose run --rm elasticsearch
Expand Down
76 changes: 75 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,81 @@ You can customize additional settings in your `.env` file:
> [!NOTE]
> The variables `ES_HOST`, `ES_PORT`, `ES_USE_SSL`, `ES_VERIFY_CERTS` and `ES_TIMEOUT` apply to both Elasticsearch and OpenSearch backends, so there is no need to rename the key names to `OS_` even if you're using OpenSearch.

# Datetime-Based Index Management

## Overview

SFEOS supports two indexing strategies for managing STAC items:

1. **Simple Indexing** (default) - One index per collection
2. **Datetime-Based Indexing** - Time-partitioned indexes with automatic management

The datetime-based indexing strategy is particularly useful for large temporal datasets. When a user provides a datetime parameter in a query, the system knows exactly which index to search, providing **multiple times faster searches** and significantly **reducing database load**.

## When to Use

**Recommended for:**
- Systems with large collections containing millions of items
- Systems requiring high-performance temporal searching

**Pros:**
- Multiple times faster queries with datetime filter
- Reduced database load - only relevant indexes are searched

**Cons:**
- Slightly longer item indexing time (automatic index management)
- Greater management complexity

## Configuration

### Enabling Datetime-Based Indexing

Enable datetime-based indexing by setting the following environment variable:

```bash
ENABLE_DATETIME_INDEX_FILTERING=true
```

### Related Configuration Variables

| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `ENABLE_DATETIME_INDEX_FILTERING` | Enables time-based index partitioning | `false` | `true` |
| `DATETIME_INDEX_MAX_SIZE_GB` | Maximum size limit for datetime indexes (GB) - note: add +20% to target size due to ES/OS compression | `25` | `50` |
| `STAC_ITEMS_INDEX_PREFIX` | Prefix for item indexes | `items_` | `stac_items_` |

## How Datetime-Based Indexing Works

### Index and Alias Naming Convention

The system uses a precise naming convention:

**Physical indexes:**
```
{ITEMS_INDEX_PREFIX}{collection-id}_{uuid4}
```

**Aliases:**
```
{ITEMS_INDEX_PREFIX}{collection-id} # Main collection alias
{ITEMS_INDEX_PREFIX}{collection-id}_{start-datetime} # Temporal alias
{ITEMS_INDEX_PREFIX}{collection-id}_{start-datetime}_{end-datetime} # Closed index alias
```

**Example:**

*Physical indexes:*
- `items_sentinel-2-l2a_a1b2c3d4-e5f6-7890-abcd-ef1234567890`

*Aliases:*
- `items_sentinel-2-l2a` - main collection alias
- `items_sentinel-2-l2a_2024-01-01` - active alias from January 1, 2024
- `items_sentinel-2-l2a_2024-01-01_2024-03-15` - closed index alias (reached size limit)

### Index Size Management

**Important - Data Compression:** Elasticsearch and OpenSearch automatically compress data. The configured `DATETIME_INDEX_MAX_SIZE_GB` limit refers to the compressed size on disk. It is recommended to add +20% to the target size to account for compression overhead and metadata.

## Interacting with the API

- **Creating a Collection**:
Expand Down Expand Up @@ -538,4 +613,3 @@ You can customize additional settings in your `.env` file:
- Ensures fair resource allocation among all clients

- **Examples**: Implementation examples are available in the [examples/rate_limit](examples/rate_limit) directory.

3 changes: 3 additions & 0 deletions compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ services:
- ES_USE_SSL=false
- ES_VERIFY_CERTS=false
- BACKEND=elasticsearch
- DATABASE_REFRESH=true
ports:
- "8080:8080"
volumes:
Expand Down Expand Up @@ -72,6 +73,7 @@ services:
hostname: elasticsearch
environment:
ES_JAVA_OPTS: -Xms512m -Xmx1g
action.destructive_requires_name: false
volumes:
- ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
- ./elasticsearch/snapshots:/usr/share/elasticsearch/snapshots
Expand All @@ -86,6 +88,7 @@ services:
- discovery.type=single-node
- plugins.security.disabled=true
- OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m
- action.destructive_requires_name=false
volumes:
- ./opensearch/config/opensearch.yml:/usr/share/opensearch/config/opensearch.yml
- ./opensearch/snapshots:/usr/share/opensearch/snapshots
Expand Down
23 changes: 19 additions & 4 deletions stac_fastapi/core/stac_fastapi/core/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
BulkTransactionMethod,
Items,
)
from stac_fastapi.sfeos_helpers.database import return_date
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core module should not import from sfeos_helpers. Can we apply this function in Elasticsearch/ Opensearch database_logic.py? Ideally the core module should be database agnostic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

from stac_fastapi.types import stac as stac_types
from stac_fastapi.types.conformance import BASE_CONFORMANCE_CLASSES
from stac_fastapi.types.core import AsyncBaseCoreClient
Expand Down Expand Up @@ -324,10 +325,16 @@ async def item_collection(
search=search, collection_ids=[collection_id]
)

if datetime:
try:
datetime_search = return_date(datetime)
search = self.database.apply_datetime_filter(
search=search, interval=datetime
search=search, datetime_search=datetime_search
)
except (ValueError, TypeError) as e:
# Handle invalid interval formats if return_date fails
msg = f"Invalid interval format: {datetime}, error: {e}"
logger.error(msg)
raise HTTPException(status_code=400, detail=msg)

if bbox:
bbox = [float(x) for x in bbox]
Expand All @@ -342,6 +349,7 @@ async def item_collection(
sort=None,
token=token,
collection_ids=[collection_id],
datetime_search=datetime_search,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? We apply the datetime_search to the search variable on line 331. If this is optional, could we omit it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed in this function so that you can find which index this product is in.

)

items = [
Expand Down Expand Up @@ -500,10 +508,16 @@ async def post_search(
search=search, collection_ids=search_request.collections
)

if search_request.datetime:
try:
datetime_search = return_date(search_request.datetime)
search = self.database.apply_datetime_filter(
search=search, interval=search_request.datetime
search=search, datetime_search=datetime_search
)
except (ValueError, TypeError) as e:
# Handle invalid interval formats if return_date fails
msg = f"Invalid interval format: {search_request.datetime}, error: {e}"
logger.error(msg)
raise HTTPException(status_code=400, detail=msg)

if search_request.bbox:
bbox = search_request.bbox
Expand Down Expand Up @@ -560,6 +574,7 @@ async def post_search(
token=search_request.token,
sort=sort,
collection_ids=search_request.collections,
datetime_search=datetime_search,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here -- Is this needed? We apply the datetime_search to the search variable on line 513. If this is optional, could we omit it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

)

fields = (
Expand Down
1 change: 1 addition & 0 deletions stac_fastapi/core/stac_fastapi/core/datetime_utils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Utility functions to handle datetime parsing."""

from datetime import datetime, timezone

from stac_fastapi.types.rfc3339 import rfc3339_str_to_datetime
Expand Down
1 change: 1 addition & 0 deletions stac_fastapi/core/stac_fastapi/core/serializers.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Serializers."""

import abc
from copy import deepcopy
from typing import Any, List, Optional
Expand Down
1 change: 1 addition & 0 deletions stac_fastapi/core/stac_fastapi/core/session.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""database session management."""

import logging

import attr
Expand Down
Loading