Skip to content

imec-int/sytadel-crawler

Repository files navigation

Sytadel Crawler

API and filesystem crawler to retrieve AIS data for consenting vessels

Getting Started

Installation

Requires Python 3.11 and Poetry.

Install the dependencies from the ./pyproject.toml file:

poetry install

Each update to the ./pyproject.toml should be followed by a lock file update:

poetry update

Running the application

poetry run dev # hot-reload is enabled

Configuration

The application uses environment variables for setting up a process pipeline. It follows crawl, policy and sink configurations. Some example configurations based on use cases:

Demo configuration

Crawl all template vessels from a directory, apply default policy that forwards any crawled vessels without processing, and sink results to a context broker:

CRAWL__STRATEGY=DEMO
CRAWL__INTERVAL_MINS=10
CRAWL__DEMO_VESSELS_DIR=/cfg/demo-vessels
POLICY__STRATEGY=DEFAULT
SINK__STRATEGY=CONTEXT_BROKER
SINK__CONTEXT_BROKER_HOSTNAME=orion
SINK__CONTEXT_BROKER_PORT=1026
SINK__CONTEXT_BROKER_TENANT=demo

Nomination configuration

Crawl vessels from api, listen for nomination updates from context broker (requires setting up a subscription - see below), and sink results to a context broker:

CRAWL__STRATEGY=SPIRE
CRAWL__INTERVAL_MINS=10
CRAWL__API_KEY=your-api-key
POLICY__STRATEGY=NOMINATION
POLICY__NOMINATION__STRATEGY=CONTEXT_BROKER
POLICY__NOMINATION__CONTEXT_BROKER_HOSTNAME=orion
POLICY__NOMINATION__CONTEXT_BROKER_PORT=1026
POLICY__NOMINATION__CONTEXT_BROKER_TENANT=arcelormittal
SINK__STRATEGY=CONTEXT_BROKER
SINK__CONTEXT_BROKER_HOSTNAME=orion
SINK__CONTEXT_BROKER_PORT=1026
SINK__CONTEXT_BROKER_TENANT=arcelormittal

Subscription setup (Create a subscription in the context broker to listen for nomination updates):

curl --location 'http://localhost:1026/ngsi-ld/v1/subscriptions' \
--header 'NGSILD-Tenant: arcelormittal' \
--header 'Content-Type: application/ld+json' \
--data '{
    "description": "Nomination Events",
    "type": "Subscription",
    "entities": [
        {
            "type": "https://smartdatamodels.org/dataModel.MarineTransport/Booking"
        }
    ],
    "notification": {
        "endpoint": {
            "uri": "http://sytadel-crawler:8081/nominations/"
        }
    },
    "@context": [
      "https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context.jsonld",
      "https://raw.githubusercontent.com/smart-data-models/dataModel.MarineTransport/master/context.jsonld"
    ]
}'

Geo-restriction configuration

Crawl vessels from api, apply geo-restriction policy that defaults to Flemish region polygon, and sink results to in-memory:

CRAWL__STRATEGY=SPIRE
CRAWL__INTERVAL_MINS=10
CRAWL__API_KEY=your-api-key
POLICY__STRATEGY=GEO_RESTRICTION
# POLICY__GEO_BOUND=POLYGON((3.0 51.0, 3.0 52.0, 4.0 52.0, 4.0 51.0, 3.0 51.0))
SINK__STRATEGY=IN_MEMORY

Subset configuration

Crawl vessels from api, apply subset policy that filters provided vessels, and sink results to Azure blob storage (and optionally write events to kafka for further processing):

CRAWL__STRATEGY=SPIRE
CRAWL__INTERVAL_MINS=10
CRAWL__API_KEY=your-api-key
POLICY__STRATEGY=SUBSET
POLICY__SUBSET_MMSI_LIST="[123,456,789]"
SINK__STRATEGY=BLOB
SINK__BLOB_STORAGE_ACCOUNT_URL=https://your-account.blob.core.windows.net
SINK__BLOB_STORAGE_ACCESS_KEY=your-account-key
SINK__BLOB_STORAGE_CONTAINER=your-container
SINK__BLOB_STORAGE_DIRECTORY=your-directory
# SINK__BLOB_KAFKA_SERVERS=your-broker
# SINK__BLOB_KAFKA_TOPIC=your-topic

Feel free to mix and match configurations to suit your use case, and extend the application with new strategies as needed.

Contributing

  1. Branch naming strategy: type/JIRA-123/branch-name (e.g. feature/SYT-123/add-new-endpoint)
  2. Commit message format should follow Conventional Commits ( e.g. feat: add new endpoint)
  3. Pre-commit hooks are enabled to ensure code quality. Be sure to run pre-commit install ( after poetry install) to install the hooks. See .pre-commit-config.yaml for the list of hooks.
  4. Linting and testing scripts are available in the ./scripts folder. Run poetry run lint and poetry run test to lint and test the code respectively. Tests will create coverage reports in the .htmlcov folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published