Skip to content

Conversation

wietzesuijker
Copy link
Collaborator

@wietzesuijker wietzesuijker commented Oct 8, 2025

Split into 5 stacked PRs for review.

Sequential PRs

Review and merge in order:

How stacked PRs work

  • GitHub auto-updates base branches after each merge
  • Each PR shows only its changes (not cumulative)

Architecture

RabbitMQ → EventSource → Sensor → Argo Workflows
                                   ├─ convert-geozarr (eopf-geozarr)
                                   ├─ register-stac (STAC item creation)
                                   └─ augment-stac (add viewer links)

Reliability Features

  • Automatic retry with exponential backoff
  • Configurable timeouts via environment variables
  • Workflow step timeouts to prevent runaway processes

Testing

✅ Tests passing
✅ E2E workflow validated
✅ Pre-commit hooks passing


Note: This PR (#1) remains open for tracking. Please review and merge the 5 linked PRs above sequentially.

Add complete Argo Workflows infrastructure for geozarr pipeline with automated AMQP event triggering.

Workflow pipeline:
- Convert: Sentinel-2 Zarr → GeoZarr (cloud-optimized)
- Register: Create STAC item with metadata
- Augment: Add visualization links (XYZ tiles, TileJSON)

Event-driven automation:
- AMQP EventSource subscribes to RabbitMQ queue
- Sensor triggers workflows on incoming messages
- RBAC configuration for secure execution

Configuration:
- Python dependencies (pyproject.toml, uv.lock)
- Pre-commit hooks (ruff, mypy, yaml validation)
- TTL cleanup (24h auto-delete completed workflows)
@wietzesuijker wietzesuijker force-pushed the feature/geozarr-pipeline branch 5 times, most recently from 678ebd6 to d25b852 Compare October 8, 2025 13:55
Add STAC registration, augmentation, and workflow submission scripts.

- register_stac.py: Create/update STAC items with S3→HTTPS rewriting
- augment_stac_item.py: Add visualization links (XYZ tiles, TileJSON)
- submit_via_api.py: Submit workflows via Argo API for testing
- Retry with exponential backoff on transient failures
- Configurable timeouts via HTTP_TIMEOUT, RETRY_ATTEMPTS, RETRY_MAX_WAIT
- Workflow step timeouts: 1h convert, 5min register/augment
Add operator notebooks and environment configuration.

- Submit workflow examples (AMQP and direct API)
- Environment variable template (.env.example)
- .gitignore for Python, IDEs, Kubernetes configs
Add container build configuration and development tooling.

- Dockerfile for data-pipeline image
- Makefile for common tasks (build, push, test)
- GitHub Container Registry integration
Add comprehensive testing and project documentation.

- Unit tests for register_stac and augment_stac_item
- Integration tests for workflow submission
- E2E test configuration
- Project README, CONTRIBUTING, QUICKSTART guides
- CI workflow (GitHub Actions)
@wietzesuijker wietzesuijker force-pushed the feature/geozarr-pipeline branch from d25b852 to d52e21e Compare October 8, 2025 14:07
wietzesuijker added a commit that referenced this pull request Oct 14, 2025
Replace logger.error() calls with logger.exception() to capture full stack
traces in production Kubernetes logs. Adds structured context via extra={}
for improved observability:
- load_payload: Include file path on FileNotFoundError/JSONDecodeError
- format_routing_key: Show template and available fields on KeyError
- main: Include exchange/routing_key/host on publish failure

Closes phase3-analysis #1 (error logging completion)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant