Skip to content

MDS Kafka Streaming: Design and Concepts

Igor Petrov edited this page Sep 17, 2025 · 2 revisions

Introduction

The Mobility Data Space (MDS) enables secure, federated sharing of mobility and transportation data between organizations. This document explains the design and core concepts for implementing real-time data streaming using Apache Kafka within the MDS ecosystem, leveraging the Eclipse Dataspace Connector (EDC) framework and MDS Kafka extension.

Key Concepts

Data Space

A federated ecosystem where organizations share data under agreed-upon rules and policies. Data spaces enable secure, interoperable data sharing while maintaining data sovereignty and control.

EDC Connector

A framework that facilitates secure data sharing between data space participants. The Eclipse Dataspace Connector (EDC) provides:

  • Data Plane: Handles actual data transfer operations
  • Control Plane: Manages metadata, contracts, and policies

Kafka Streaming

Real-time data streaming using Apache Kafka topics, enabling high-throughput, fault-tolerant data distribution across the data space.

Contract Negotiation

The process of establishing data usage agreements between providers and consumers. This includes:

  • Policy evaluation
  • Terms agreement
  • Access credential generation
  • Usage monitoring

EDR (Endpoint Data Reference)

Secure references that contain access credentials for data endpoints. EDRs include:

  • Authentication tokens
  • Connection parameters
  • Access permissions
  • Usage constraints

Architecture Overview

[tbd]

Data Space Protocol

Contract Negotiation Flow

Contract negotiation follows a standardized state machine:

[tbd]

Key Message Types

  • Contract Request: Consumer initiates negotiation for a specific dataset
  • Contract Agreement: Provider responds with terms and conditions
  • EDR (Endpoint Data Reference): Contains credentials and connection details for data access

MDS EDC with Kafka Extension

MDS Kafka Data Plane Extension Components

The MDS Kafka Extension provides:

1. data-plane-kafka

Manages Kafka topic access by:

  • Creating dynamic Kafka credentials
  • Managing Access Control Lists (ACLs)
  • Handling SASL authentication
  • Supporting OIDC-based client registration

2. data-plane-kafka-spi

Defines the DataAddress format for Kafka assets:

{
  "type": "Kafka", 
  "topic": "mobility-events", 
  "kafka.bootstrap.servers": "kafka.example.com:9092", 
  "kafka.sasl.mechanism": "OAUTHBEARER",
  "kafka.security.protocol": "SASL_SSL", 
  "oidc.discovery.url": "<https://auth.example.com/.well-known/openid_configuration>",
  "oidc.client.registration.endpoint": "<https://auth.example.com/clients>"
}

Security Model

The extension implements multi-layered security:

  1. EDC-level: Contract-based access control2.
  2. Kafka-level: SASL authentication with dynamic credentials3.
  3. Network-level: TLS encryption for all communications4.
  4. Application-level: OIDC tokens for client authentication

Monitoring and Observability

Key metrics to monitor:

  • Kafka Consumer Metrics: Lag, throughput, error rates
  • Database Metrics: Connection pool, query performance
  • Application Metrics: Event processing rates, error counts
  • Infrastructure Metrics: CPU, memory, network usage

Scaling Considerations

  • Consumer Pool: Configure thread pool sizes based on expected load
  • Database: Use connection pooling and consider read replicas
  • Kafka: Partition topics appropriately for parallel processing
  • Load Balancing: Use multiple backend instances behind a load balancer

Performance Tuning

  • Kafka Consumer: Adjust max.poll.records and fetch.min.bytes
  • Database: Tune connection pool size and query timeouts
  • JVM: Configure heap size and garbage collection settings
  • Monitoring: Set up alerts for key metrics

Clone this wiki locally