Nisaba is a Java Spring Boot application that monitors NeTEx (Network Timetable Exchange) dataset exports and publishes Kafka events when new timetable datasets become available. It acts as a bridge between Marduk's dataset exports and downstream consumers that need to be notified of new timetable data.
Nisaba implements an event-driven workflow that:
- Monitors Google PubSub queue (
NetexExportNotificationQueue) for NeTEx export notifications from Marduk - Downloads NeTEx datasets from Google Cloud Storage when notified
- Extracts dataset creation dates from CompositeFrame XML elements within the NeTEx archive
- Deduplicates using an idempotent repository backed by Kafka to track previously processed datasets
- Publishes Kafka events only for new datasets (identified by codespace + creation date)
- Java 21 with Spring Boot 3.x
- Apache Camel 4.8.9 for enterprise integration patterns and routing
- Google Cloud Platform
- Cloud Storage (GCS) for dataset storage
- PubSub for event notifications
- Apache Kafka with Avro serialization for event publishing
- Maven for build management
- Docker for containerization
- Helm for Kubernetes deployment
nisaba/
├── src/main/java/no/entur/nisaba/
│ ├── App.java # Spring Boot application entry point
│ ├── Constants.java # Application constants
│ ├── config/ # Configuration classes
│ │ ├── CamelConfig.java
│ │ ├── GcsBlobStoreRepositoryConfig.java
│ │ ├── InMemoryBlobStoreRepositoryConfig.java
│ │ └── LocalDiskBlobStoreRepositoryConfig.java
│ ├── routes/ # Camel route definitions
│ │ ├── BaseRouteBuilder.java
│ │ ├── netex/notification/
│ │ │ ├── NetexImportNotificationQueueRouteBuilder.java
│ │ │ └── RestNotificationRouteBuilder.java
│ │ └── blobstore/
│ │ ├── MardukBlobStoreRoute.java
│ │ ├── NisabaBlobStoreRoute.java
│ │ └── NisabaExchangeBlobStoreRoute.java
│ ├── services/ # Blob store service implementations
│ │ ├── AbstractBlobStoreService.java
│ │ ├── MardukBlobStoreService.java
│ │ ├── NisabaBlobStoreService.java
│ │ └── NisabaExchangeBlobStoreService.java
│ ├── event/ # Event handling
│ │ ├── NetexImportEventFactory.java
│ │ └── NetexImportEventKeyFactory.java
│ ├── pubsub/
│ │ └── PubSubAutoCreateEventNotifier.java
│ └── exceptions/
│ └── NisabaException.java
├── src/main/avro/
│ └── NetexImportEvent.avsc # Avro schema definition
├── src/main/resources/
│ └── logback.xml # Logging configuration
├── helm/nisaba/ # Kubernetes deployment charts
├── Dockerfile # Container image definition
└── pom.xml # Maven project configuration
The core workflow is implemented in NetexImportNotificationQueueRouteBuilder with the following routes:
- netex-export-notification-queue: Main entry point, receives PubSub notifications
- download-netex-dataset: Downloads the dataset from GCS
- retrieve-dataset-creation-time: Parses XML files to extract creation dates
- parse-created-attribute: XPath-based extraction of the
createdattribute - notify-consumers-if-new: Checks idempotency and triggers notification
- notify-consumers: Publishes to Kafka topic
- find-chouette-import-key: Identifies the original Chouette dataset for mixed sources
- copy-dataset-to-private-bucket: Copies whitelisted datasets to a private bucket
Three service implementations handle access to different GCS buckets:
- MardukBlobStoreService: Accesses Marduk's export bucket (
marduk-{env}) - NisabaBlobStoreService: Manages Nisaba's own storage
- NisabaExchangeBlobStoreService: Handles exchange bucket for imported datasets
The Avro schema (NetexImportEvent.avsc) defines the Kafka event structure:
{
"codespace": "string", // Dataset codespace
"importDateTime": "string", // ISO-formatted local datetime
"importKey": "string", // Unique key: <codespace>_<datetime>
"publishedDatasetURI": "string", // GCS link to published dataset
"publishedDatasetPublicLink": "string", // Public HTTPS link
"originalDatasetURI": "string", // GCS link to original dataset
"serviceJourneys": "int", // Obsolete, always 0
"commonFiles": "int" // Obsolete, always 0
}-
Notification Received
- PubSub message received with dataset codespace
- Correlation ID set for request tracing
-
Dataset Download
- Constructs file path:
outbound/netex/rb_<codespace>-aggregated-netex.zip - Downloads from GCS bucket (e.g.,
marduk-production)
- Constructs file path:
-
Creation Date Extraction
- Unzips archive and iterates through XML files
- Uses XPath to extract
@createdattribute fromCompositeFrameelements - Aggregates all creation dates into a sorted set
- Selects the latest date as the dataset creation time
-
Idempotency Check
- Generates unique key:
<codespace>_<creation_date>(e.g.,avi_2021-04-21T11:51:59) - Checks against Kafka-backed idempotent repository
- History retained for 365 days
- Generates unique key:
-
Event Publication
- If new dataset: publishes Avro-encoded event to Kafka topic
- Kafka key: dataset codespace
- Topic naming:
rutedata-dataset-import-event-{env}
-
Optional Private Copy
- For whitelisted codespaces, copies dataset to private bucket
- File naming convention:
rb_<codespace>-aggregated-netex.zip - Creation date source:
createdattribute in NeTEx CompositeFrame - Mixed datasets: When combining Chouette and Uttu exports, uses maximum creation date
- Chouette identification: Looks up original import in exchange bucket to find Chouette source
The application supports multiple environments (dev, test, production) with corresponding:
- GCS buckets:
marduk-{env}, storage varies by environment - Kafka topics:
rutedata-dataset-import-event-{env} - PubSub queues: Environment-specific subscriptions
marduk.pubsub.project.id: GCP project for PubSubnisaba.kafka.topic.event: Kafka topic for publishing eventsnisaba.netex.publication.internal.bucket: Private bucket for whitelisted datasetsnisaba.netex.publication.internal.whitelist: Codespaces eligible for private copynisaba.shutdown.timeout: Graceful shutdown timeout (default: 300s)
- Base image:
bellsoft/liberica-openjre-alpine:21.0.9 - Multi-stage build for optimized image size
- Layer extraction for efficient caching
- Non-root user (
appuser) for security - Init system:
tinifor proper signal handling
- Helm charts in
helm/nisaba/ - Supports clustered deployment with leader election
- Uses Camel's
master:component for singleton routes - Kubernetes cluster service for leader election
mvn clean installmvn testcamel-spring-boot-starter: Camel framework integrationcamel-google-pubsub-starter: Google PubSub consumercamel-kafka-starter: Kafka producercamel-zipfile-starter: ZIP file processingcamel-xpath-starter: XML parsingentur-google-pubsub: Entur's GCP helpersstorage-gcp-gcs: GCS blob storage accessnetex-java-model: NeTEx data modelkafka-avro-serializer: Confluent Avro serialization
Test infrastructure includes:
NisabaRouteBuilderIntegrationTestBase: Base class for integration tests- Testcontainers for GCloud emulation
- Spring Boot test support
- Camel test framework
- Marduk: Publishes NeTEx export events
- Chouette: Initial dataset import and creation date generation
- Uttu: FlexibleLines management with creation date generation
- Actuator endpoints: Health checks and metrics via Spring Boot Actuator
- Prometheus metrics: Exposed via Micrometer registry
- Structured logging: Logstash-compatible JSON logging with Logback
- MDC logging: Camel correlation IDs for request tracing
- Message history: Enabled for route debugging
- Idempotency: Critical for preventing duplicate event publication
- Leader election: Required in clustered deployments to prevent race conditions
- Graceful shutdown: 5-minute timeout ensures in-flight messages complete
- Creation date semantics: For mixed datasets, uses maximum of all CompositeFrame dates
- Historical retention: 365-day window in idempotent repository