API and filesystem crawler to retrieve AIS data for consenting vessels
Requires Python 3.11 and Poetry.
Install the dependencies from the ./pyproject.toml
file:
poetry install
Each update to the ./pyproject.toml
should be followed by a lock file update:
poetry update
poetry run dev # hot-reload is enabled
The application uses environment variables for setting up a process pipeline. It follows crawl, policy and sink configurations. Some example configurations based on use cases:
Crawl all template vessels from a directory, apply default policy that forwards any crawled vessels without processing, and sink results to a context broker:
CRAWL__STRATEGY=DEMO
CRAWL__INTERVAL_MINS=10
CRAWL__DEMO_VESSELS_DIR=/cfg/demo-vessels
POLICY__STRATEGY=DEFAULT
SINK__STRATEGY=CONTEXT_BROKER
SINK__CONTEXT_BROKER_HOSTNAME=orion
SINK__CONTEXT_BROKER_PORT=1026
SINK__CONTEXT_BROKER_TENANT=demo
Crawl vessels from api, listen for nomination updates from context broker (requires setting up a subscription - see below), and sink results to a context broker:
CRAWL__STRATEGY=SPIRE
CRAWL__INTERVAL_MINS=10
CRAWL__API_KEY=your-api-key
POLICY__STRATEGY=NOMINATION
POLICY__NOMINATION__STRATEGY=CONTEXT_BROKER
POLICY__NOMINATION__CONTEXT_BROKER_HOSTNAME=orion
POLICY__NOMINATION__CONTEXT_BROKER_PORT=1026
POLICY__NOMINATION__CONTEXT_BROKER_TENANT=arcelormittal
SINK__STRATEGY=CONTEXT_BROKER
SINK__CONTEXT_BROKER_HOSTNAME=orion
SINK__CONTEXT_BROKER_PORT=1026
SINK__CONTEXT_BROKER_TENANT=arcelormittal
Subscription setup (Create a subscription in the context broker to listen for nomination updates):
curl --location 'http://localhost:1026/ngsi-ld/v1/subscriptions' \
--header 'NGSILD-Tenant: arcelormittal' \
--header 'Content-Type: application/ld+json' \
--data '{
"description": "Nomination Events",
"type": "Subscription",
"entities": [
{
"type": "https://smartdatamodels.org/dataModel.MarineTransport/Booking"
}
],
"notification": {
"endpoint": {
"uri": "http://sytadel-crawler:8081/nominations/"
}
},
"@context": [
"https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context.jsonld",
"https://raw.githubusercontent.com/smart-data-models/dataModel.MarineTransport/master/context.jsonld"
]
}'
Crawl vessels from api, apply geo-restriction policy that defaults to Flemish region polygon, and sink results to in-memory:
CRAWL__STRATEGY=SPIRE
CRAWL__INTERVAL_MINS=10
CRAWL__API_KEY=your-api-key
POLICY__STRATEGY=GEO_RESTRICTION
# POLICY__GEO_BOUND=POLYGON((3.0 51.0, 3.0 52.0, 4.0 52.0, 4.0 51.0, 3.0 51.0))
SINK__STRATEGY=IN_MEMORY
Crawl vessels from api, apply subset policy that filters provided vessels, and sink results to Azure blob storage (and optionally write events to kafka for further processing):
CRAWL__STRATEGY=SPIRE
CRAWL__INTERVAL_MINS=10
CRAWL__API_KEY=your-api-key
POLICY__STRATEGY=SUBSET
POLICY__SUBSET_MMSI_LIST="[123,456,789]"
SINK__STRATEGY=BLOB
SINK__BLOB_STORAGE_ACCOUNT_URL=https://your-account.blob.core.windows.net
SINK__BLOB_STORAGE_ACCESS_KEY=your-account-key
SINK__BLOB_STORAGE_CONTAINER=your-container
SINK__BLOB_STORAGE_DIRECTORY=your-directory
# SINK__BLOB_KAFKA_SERVERS=your-broker
# SINK__BLOB_KAFKA_TOPIC=your-topic
Feel free to mix and match configurations to suit your use case, and extend the application with new strategies as needed.
- Branch naming strategy:
type/JIRA-123/branch-name
(e.g.feature/SYT-123/add-new-endpoint
) - Commit message format should
follow Conventional Commits (
e.g.
feat: add new endpoint
) - Pre-commit hooks are enabled to ensure code quality. Be sure to run
pre-commit install
( afterpoetry install
) to install the hooks. See.pre-commit-config.yaml
for the list of hooks. - Linting and testing scripts are available in the
./scripts
folder. Runpoetry run lint
andpoetry run test
to lint and test the code respectively. Tests will create coverage reports in the.htmlcov
folder.