You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: index.bs
+105-4Lines changed: 105 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -333,6 +333,40 @@ This results in the following JSON object:
333
333
334
334
# RDF-Connect by Layer
335
335
336
+
Communication between the orchestrator and the runners happens using a strongly typed protocol defined in Protocol Buffers (protobuf).
337
+
This enables language-independent and efficient communication across processes and machines.
338
+
339
+
The protobuf server is the orchestrator, which is the central point.
340
+
The orchestrator starts all runners and notifies the runners of the different processors they should start.
341
+
The orchestrator is also the message post office, allowing messages to be sent to the correct runner which will relay those messages to the correct processor.
342
+
343
+
344
+
## Communication Protocol: Orchestrator ↔ Runner
345
+
346
+
RDF Connect uses a bidirectional communication protocol based on Protocol Buffers (protobuf) for interaction between the **Orchestrator** and **Runners**.
347
+
The orchestrator manages execution, while runners host and execute individual processors.
348
+
349
+
### Messages Sent to Runners
350
+
351
+
The orchestrator can send the following messages to the runner:
352
+
353
+
* Rpc.pipeline: Contains the complete pipeline configuration (in Turtle syntax).
354
+
* Rpc.proc: Instructs the runner to register and setup a new processor.
355
+
* Rpc.start: Signals that all added processors should begin execution.
356
+
* Rpc.close: Instructs the runner to gracefully shut down and terminate all processors.
357
+
* Rpc.msg: Delivers a message to a specific processor. Used for normal data transfer.
358
+
* Rpc.streamMsg: Begins a streaming message transmission to a processor, typically for large or chunked payloads.
359
+
360
+
### Messages Sent from Runners
361
+
362
+
The runner can send the following messages to the orchestrator:
363
+
364
+
* Rpc.identify: Indicates that the runner is ready and provides their unique identifier (IRI).
365
+
* Rpc.init: Confirms that a previously registered **processor** has successfully started.
366
+
* Rpc.close: Notifies that the runner is shutting down because all processors have stopped.
367
+
* Rpc.msg: Sends a message from a processor to another processor via the orchestrator.
368
+
* Rpc.streamMsg: Sends a streaming message from a processor. Used when the payload is too large for a single message.
369
+
336
370
337
371
## Orchestrator
338
372
@@ -355,13 +389,80 @@ Responsibilities:
355
389
The remained of this section is intended for developers building custom runners or integrating RDF-Connect into infrastructure.
356
390
</div>
357
391
358
-
### Protobuf Messaging Protocol
359
-
360
-
Communication between the orchestrator and the runners happens using a strongly typed protocol defined in Protocol Buffers (protobuf).
361
-
This enables language-independent and efficient communication across processes and machines.
362
392
363
393
### Startup Flow
364
394
395
+
#### Understanding the pipeline file
396
+
397
+
The orchestrator begins execution from a single pipeline RDF file. This file **MUST** first be expanded by resolving all `owl:imports` statements recursively.
398
+
Once the full RDF graph is assembled, the orchestrator extracts the pipeline to execute by locating a `rdfc:Pipeline` instance whose **subject is the pipeline file itself**.
399
+
The pipeline is composed of one or more **runner–processor pairs**, defined via the `rdfc:consistsOf` property.
400
+
Each pair includes:
401
+
402
+
- A reference to a runner, using the `rdfc:instantiates` property.
403
+
- One or more processors, referenced with the `rdfc:processor` property.
404
+
405
+
406
+
<div class="example" title="Pipeline definition">
407
+
408
+
This example pipeline contains three processors divided over two runners.
409
+
The orchestrator starts for this pipeline, two runners and provide them with the correct processor configurations.
- the language it supports, linked with `rdfc:handlesSubjectsOf`
430
+
- how the runner should be started, linked with `rdfc:command`
431
+
432
+
<aside class="note">
433
+
The property `rdfc:handlesSubjectsOf` plays a somewhat unconventional role. It allows RDF Connect to remain aligned with [PROV-O](https://www.w3.org/TR/prov-o/).
434
+
A runner may link to a predicate such as `rdfc:jsImplementationOf`, and each JavaScript processor may then declare itself using:
435
+
```turtle
436
+
<FooBar> rdfc:jsImplementationOf rdfc:Processor.
437
+
```
438
+
Since `rdfc:jsImplementationOf` is a subproperty of `rdfs:subClassOf`, this implies that `<FooBar>` is a subclass of `rdfc:Processor`, which itself is a subclass of prov:Activity.
439
+
Therefore, any instance of `<FooBar>` can be inferred to be a prov:Activity.
This inference enables seamless integration with provenance-aware tooling.
445
+
</aside>
446
+
<aside class="note">
447
+
Currently, only **command-line-based runners** are supported and defined, but the model is intentionally extensible.
448
+
In the future, runners might be deployed remotely and communicate over a network. Such runners would not require a `rdfc:command`, but instead might define a connection endpoint (e.g., a URL or service descriptor).
449
+
</aside>
450
+
451
+
For each runner, the orchestrator appends two arguments to the configured command: the URL of the orchestrator's running Protobuf server and the IRI identifying the runner instance.
452
+
It then executes the resulting command to spawn the runner process.
453
+
454
+
The orchestrator MUST track all active runners, specifically recording which ones have sent an RPC.identify message after startup. This message confirms that the runner is ready to accept processor assignments.
455
+
456
+
Once all expected runners have successfully identified themselves, the orchestrator proceeds to the next step in the pipeline initialization sequence.
457
+
458
+
#### Starting processors
459
+
460
+
461
+
<div class=issue>
462
+
🚧 This section is a work in progress and will be expanded soon.
463
+
</div>
464
+
465
+
365
466
The following diagram describes the startup sequence handled by the orchestrator. This includes validating pipeline structure, instantiating runners, and initializing processor instances.
0 commit comments