DNDM (Decentralized Named Data Messaging) is a communication library inspired by LibP2P, Named Data Networks, Pub/Sub architectures, and ROS. The primary goal is to provide efficient, typed message passing between multiple servers, devices (IoT, robots, embedded devices), using protobuf specifications with minimal copying.
-
Intent-Interest Pattern: Similar to Pub/Sub but with stronger typing
- Intent: Declares availability to publish data on a specific route
- Interest: Declares desire to receive data on a specific route
- Link: Automatic connection between matching Intent and Interest
-
Routes: Typed, named paths for data streams
- Format:
TypeName@path(e.g.,Foo@example.foobar) - Supports plain routes and hashed routes for security
- Routes combine message type with path for type-safe routing
- Format:
-
Endpoints: Abstraction layer for different communication methods
- Direct: Same process, zero-copy via channels
- Remote: Different processes/systems via network connections (TCP/UDP/Serial)
- Mesh: Distributed peer-to-peer overlay with automatic discovery
- Extensibility: Additional endpoints/connectors can be added (e.g., via
x/transport) without changing the router.
-
Peer-Based Routing: Path-based prefix matching for distributed routing
- Each peer has a unique path prefix
- Routes are matched based on peer path prefixes
- Enables hierarchical namespace distribution
graph TB
Router[Router]
Router --> IRouter[Intent Routers]
Router --> INTRouter[Interest Routers]
Router --> Linker[Linker]
Router --> DirectEP[Direct Endpoint]
Router --> RemoteEP[Remote Endpoint]
Router --> MeshEP[Mesh Endpoint]
DirectEP --> Linker
RemoteEP --> Linker
MeshEP --> Container[Container]
Container --> RemoteEP2[Remote Endpoints]
RemoteEP --> Network[Network Layer]
Network --> Codec[Codec]
Network --> TCP[TCP/UDP]
Network --> Serial[Serial]
Network --> NATS[NATS - Planned]
style Router fill:#e1f5ff
style Linker fill:#fff5e1
style DirectEP fill:#e1ffe1
style RemoteEP fill:#ffe1f5
style MeshEP fill:#f0e1ff
- Top-level component managing all endpoints
- Routes intents/interests across endpoints
- Manages IntentRouters and InterestRouters
-
Direct Endpoint: In-process communication
- Uses channels for zero-copy message passing
- Links Intent directly to Interest via channels
-
Remote Endpoint: Cross-process/system communication
- Manages network connections
- Handles intent/interest propagation
- Manages remote intent/interest wrapping
-
Mesh Endpoint: Distributed full-mesh network
- Manages multiple remote endpoints
- Peer discovery and address book
- Handshake protocol for connection establishment
- Connects Intent with Interest when routes match
- Manages lifecycle of links
- Handles notifications when links are established
-
IntentRouter: Fan-out to multiple intents
- Routes messages from one source to multiple destinations
- Manages wrappers for multiple subscribers
-
InterestRouter: Fan-in from multiple interests
- Routes messages from multiple sources to one destination
- Manages wrappers for multiple publishers
graph LR
Pub[Publisher] -->|Publish route| Router[Router]
Router --> EP[Endpoint]
EP --> IR[IntentRouter]
IR --> LI1[LocalIntent]
IR --> RI[RemoteIntent]
RI --> Net[Network]
Net --> Remote[Remote Peer]
style Pub fill:#e1ffe1
style Remote fill:#ffe1f5
graph LR
Sub[Subscriber] -->|Subscribe route| Router[Router]
Router --> EP[Endpoint]
EP --> INR[InterestRouter]
INR --> LI1[LocalInterest]
INR --> RI[RemoteInterest]
RI --> Net[Network]
Net --> Remote[Remote Peer]
style Sub fill:#e1ffe1
style Remote fill:#ffe1f5
sequenceDiagram
participant P as Publisher
participant IR as IntentRouter
participant FOI as FanOutIntent
participant LI as LocalIntent
participant RI as RemoteIntent
participant N as Network
participant R as Remote
participant IN as Interest
participant INR as InterestRouter
participant S as Subscriber
P->>IR: Send(message)
IR->>FOI: Route
FOI->>LI: Fan-out
FOI->>RI: Fan-out
RI->>N: Encode & Send
N->>R: Transmit
R->>IN: Decode & Deliver
IN->>INR: Route
INR->>S: Deliver via C()
[Magic (4 bytes)] [Total Size (4 bytes)] [Header Size (4 bytes)]
[Header (variable)] [Message Size (4 bytes)] [Message (variable)]
receive_timestamp: Set by receivertimestamp: Set by sendertype: Message type enum (MESSAGE, INTENT, INTEREST, etc.)want_result: Request responsesignature: For authentication (future)route: Route identifier
MESSAGE: Actual data payloadINTENT: Intent advertisementINTENTS: Batch intent advertisementINTEREST: Interest advertisementINTERESTS: Batch interest advertisementNOTIFY_INTENT: Notification of matching interestPING/PONG: Latency measurementHANDSHAKE: Connection establishmentPEERS: Peer discoveryADDRBOOK: Address book synchronizationRESULT: Response to request
-
Route Matching: How should route matching work in mesh networks?
- Current: Prefix matching based on peer paths
- Question: Should we support wildcard routes? Regex? Hierarchical matching rules?
-
Hashed Routes: When and how should hashed routes be used?
- Current: Basic implementation exists
- Question: What's the security model? How are capabilities distributed?
- Question: How do peers discover hashed routes without knowing the hash?
-
Route Collision: How to handle route collisions?
- Current: TBD in README
- Question: Should we support multiple publishers for same route?
- Question: What happens when two peers have same path but different addresses?
-
Peer Discovery: What discovery mechanisms should be supported?
- Current: Address book with manual/peer-provided entries
- Question: Should we support mDNS/Bonjour? DHT? Centralized registry?
- Question: How to handle NAT traversal?
-
Connection Management: How should connections be managed?
- Current: Full-mesh with automatic dialing
- Question: Should we support partial mesh? Star topology? Hierarchical?
- Question: How to handle connection failures and reconnection strategies?
-
Transport Protocols: What transport protocols should be supported?
- Current: TCP/UDP via network package
- Question: Should we support WebSocket? QUIC? Serial ports (already exists)?
- Question: How to handle unreliable transports?
-
Message Batching: Should we batch intents/interests?
- Current: INTENTS/INTERESTS message types exist
- Question: When should batching be used? Automatic? Configuration?
-
Buffer Management: How to handle backpressure?
- Current: Fixed-size channels
- Question: Should we support dynamic sizing? Dropping old messages?
- Question: How to handle slow consumers?
-
Zero-Copy: How to maximize zero-copy across network boundaries?
- Current: Direct endpoint uses channels (zero-copy in-process)
- Question: Can we use shared memory for local network communication?
- Question: How to handle message serialization with minimal copying?
-
Authentication: How should peers authenticate?
- Current: No authentication
- Question: What authentication mechanisms? TLS? mTLS? Token-based?
- Question: Should authentication be per-peer or per-route?
-
Authorization: How to control access to routes?
- Current: Path-based matching only
- Question: Should we support ACLs? Capability-based security?
- Question: How to handle permissions for hashed routes?
-
Message Signing: How should message signatures work?
- Current: Signature field in header (not used)
- Question: Should all messages be signed? Only control messages?
- Question: What signing algorithm? Key management?
-
Message Delivery: What delivery guarantees should be provided?
- Current: Best-effort delivery
- Question: Should we support at-least-once? Exactly-once? Ordered delivery?
- Question: How to handle message acknowledgment?
-
Failure Recovery: How should the system handle failures?
- Current: Basic error handling, connection closure
- Question: Should we support automatic retry? Circuit breakers?
- Question: How to handle partial failures in mesh networks?
-
State Synchronization: How to handle state changes?
- Current: Intent/Interest advertised on creation
- Question: Should we support re-advertisement on reconnection?
- Question: How to handle network partitions?
-
Type Safety: How to improve type safety at API level?
- Current: Route requires proto.Message type
- Question: Should we support typed wrappers? Generics?
- Question: How to avoid type casting at consumer side?
-
Error Handling: How to handle and propagate errors?
- Current: Error return values
- Question: Should we support error channels? Error callbacks?
- Question: How to distinguish transient vs permanent errors?
-
Configuration: How should the library be configured?
- Current: Options pattern
- Question: Should we support configuration files? Environment variables?
- Question: What configuration should be exposed?
-
Observability: What metrics and logs should be exposed?
- Current: Basic logging with slog
- Question: What metrics should be tracked? Latency? Throughput? Error rates?
- Question: Should we support distributed tracing?
-
Testing: How to test distributed behavior?
- Current: Unit tests for components
- Question: Should we support integration tests? Chaos testing?
- Question: How to test mesh network behavior?
-
Race Conditions:
- Multiple goroutines accessing shared state in Linker, Router
- Channel closure race conditions in StreamContext
-
Resource Leaks:
- Goroutine leaks if contexts aren't properly cancelled
- Buffer pool management could be improved
-
Error Handling:
- Some errors are logged but not propagated
- Result handling is awkward (TODO comment in remote/messages.go)
-
Handshake Protocol:
- Very rudimentary (FIXME comment in handshake.go)
- Missing intent/interest synchronization during handshake
-
Performance:
- No message batching implementation
- Serial encoding/decoding could be optimized
- No connection pooling for network connections
-
Documentation:
- Limited API documentation
- No usage examples for mesh networks
- Missing architecture diagrams
-
Protocol Enhancement:
- Implement proper handshake with intent/interest exchange
- Add message batching
- Implement flow control
-
Reliability:
- Add retry mechanisms
- Implement circuit breakers
- Add health checks
-
Security:
- Implement authentication
- Add message signing
- Support encryption
-
Observability:
- Add structured metrics
- Implement distributed tracing
- Add performance profiling
-
Testing:
- Add integration tests
- Implement chaos testing
- Add performance benchmarks
-
API Ergonomics:
- Add typed wrappers
- Improve error handling
- Add builder patterns where appropriate
DNDM is designed for controlled environments where the system topology and message types are known at design time. The primary use case is as an internal message bus for robotics applications.
- Known Intent/Interest Set: The set of intents and interests is more or less known at design time
- Known Clients/Producers: The set of clients and producers is known and controlled
- Internal Communication: Primary focus is internal robot communication, not public-facing APIs
graph TB
subgraph "In-Process (Zero Copy)"
Channels[Go Channels<br/>Direct Endpoint]
end
subgraph "LAN Communication"
Net[TCP/UDP<br/>Network Endpoint]
end
subgraph "Embedded Devices"
Serial[Serial Port<br/>Serial Endpoint]
end
subgraph "User Interface"
NATS[NATS<br/>For user commands]
CLI[CLI/Bluetooth<br/>Keyboard/Joystick]
NetworkUI[Network<br/>Web/API]
end
Channels -->|Within computer| Computer[Single Computer]
Net -->|Between computers| Computers[Multiple Computers LAN]
Serial -->|Computer to embedded| Embedded[RP2040/ESP32]
NATS -->|User commands| Robot
CLI -->|Direct input| Robot
NetworkUI -->|Remote control| Robot
style Channels fill:#e1ffe1
style Net fill:#e1f5ff
style Serial fill:#ffe1f5
style NATS fill:#fff5e1
Transport Selection:
- In-Process: Go channels via Direct endpoint (zero-copy)
- LAN (Computer-to-Computer): TCP/UDP via Network endpoint
- Computer-to-Embedded: Serial ports (e.g., Raspberry Pi ↔ RP2040/ESP32)
- User Interaction:
- Network (TCP/UDP)
- NATS (for cloud/external commands)
- CLI on device (e.g., RPI with Bluetooth keyboard/joystick)
The system may use UDP broadcast to reduce network congestion. When using broadcast:
- Messages are sent to all instances on the network
- Instances without interest should reject packets early (at network layer)
- Only interested instances process the message
- This reduces network overhead while maintaining efficiency
Some checks, limits, and validations are intentionally omitted in this implementation:
- Runtime Validation: Type checking and route validation assume known, controlled routes
- Rate Limiting: Not implemented by default (can be added as middleware)
- Access Control: No authentication/authorization (trusted environment)
- Resource Limits: No enforced limits on connections, routes, or message sizes
Rationale: These can be easily added later due to the modular architecture:
- Middleware pattern for rate limiting
- Wrapper functions for validation
- Endpoint-level access control
- Configuration-based resource limits
- Efficient typed message passing (zero-copy where possible)
- Minimal copying (direct channels in-process, efficient serialization for network)
- Support for distributed systems (multiple computers, embedded devices)
- Protobuf-based type system (compile-time type safety)
- Works across different network topologies (LAN, serial, in-process)
- Single process communication (Direct endpoint)
- LAN communication (TCP/UDP via Network endpoint)
- Serial communication (for embedded devices)
- Basic mesh networking (for automatic peer discovery)
- Zero-copy optimizations (in-process channels)
- Full authentication/authorization (when needed for less controlled environments)
- Advanced error recovery (circuit breakers, retries)
- Rate limiting middleware
- Resource limit enforcement
- Advanced security features