You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: index.bs
+34-34Lines changed: 34 additions & 34 deletions
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,7 @@ This streaming model ensures efficient memory usage and enables continuous trans
50
50
The configuration of pipelines, processors, inputs, and outputs is expressed semantically using RDF.
51
51
This use of **semantic configuration** promotes clarity, extensibility, and interoperability by describing the structure and behavior of the system in a machine-readable, standards-based format.
52
52
53
-
Processor components in RDFConnect are designed for **reusability**.
53
+
Processor components in RDF-Connect are designed for **reusability**.
54
54
A processor defined once can be reused across multiple pipelines and in varying contexts, reducing duplication and encouraging modular design practices.
55
55
This modularity fosters rapid prototyping and easier maintenance of complex workflows.
56
56
@@ -78,7 +78,7 @@ The pipeline configuration itself is expressed in RDF, making it semantically ex
78
78
A *processor* is a modular, reusable software component that performs a discrete data processing task.
79
79
In a typical scenario, a *processor* receives input data through a *reader*, transforms it according to its logic, and emits the result via a *writer*.
80
80
However, processors are also flexible enough to support single-directional tasks, such as those that only produce or only consume data.
81
-
Crucially, processors are implementation-agnostic --- they can be written in any programming language and integrated into pipelines via language-specific runners.
81
+
Crucially, processors are implementation-agnostic — they can be written in any programming language and integrated into pipelines via language-specific runners.
82
82
This makes processors the building blocks of RDF-Connect’s cross-language interoperability.
83
83
84
84
## Runner
@@ -120,7 +120,7 @@ Shapes define required and optional configuration properties, which are transfor
120
120
In RDF-Connect, such shapes are used to describe required parameters.
121
121
They result in a well-typed JSON object that developers can rely on during implementation.
122
122
123
-
Shacl shape defining some required configuration for a processor
123
+
SHACL shape defining some required configuration for a processor
124
124
125
125
```turtle
126
126
[] a sh:NodeShape;
@@ -328,29 +328,29 @@ The orchestrator is also the message post office, allowing messages to be sent t
328
328
329
329
## Communication Protocol: Orchestrator ↔ Runner
330
330
331
-
RDFConnect uses a bidirectional communication protocol based on Protocol Buffers (protobuf) for interaction between the **Orchestrator** and **Runners**.
331
+
RDF-Connect uses a bidirectional communication protocol based on Protocol Buffers (protobuf) for interaction between the **Orchestrator** and **Runners**.
332
332
The orchestrator manages execution, while runners host and execute individual processors.
333
333
334
334
### Messages Sent to Runners
335
335
336
336
The orchestrator can send the following messages to the runner:
337
337
338
-
* Rpc.pipeline: Contains the complete pipeline configuration (in Turtle syntax).
339
-
* Rpc.proc: Instructs the runner to register and setup a new processor.
340
-
* Rpc.start: Signals that all added processors should begin execution.
341
-
* Rpc.close: Instructs the runner to gracefully shut down and terminate all processors.
342
-
* Rpc.msg: Delivers a message to a specific processor. Used for normal data transfer.
343
-
* Rpc.streamMsg: Begins a streaming message transmission to a processor, typically for large or chunked payloads.
338
+
* `RPC.pipeline`: Contains the complete pipeline configuration (in Turtle syntax).
339
+
* `RPC.proc`: Instructs the runner to register and setup a new processor.
340
+
* `RPC.start`: Signals that all added processors should begin execution.
341
+
* `RPC.close`: Instructs the runner to gracefully shut down and terminate all processors.
342
+
* `RPC.msg`: Delivers a message to a specific processor. Used for normal data transfer.
343
+
* `RPC.streamMsg`: Begins a streaming message transmission to a processor, typically for large or chunked payloads.
344
344
345
345
### Messages Sent from Runners
346
346
347
347
The runner can send the following messages to the orchestrator:
348
348
349
-
* Rpc.identify: Indicates that the runner is ready and provides their unique identifier (IRI).
350
-
* Rpc.init: Confirms that a previously registered **processor** has successfully started.
351
-
* Rpc.close: Notifies that the runner is shutting down because all processors have stopped.
352
-
* Rpc.msg: Sends a message from a processor to another processor via the orchestrator.
353
-
* Rpc.streamMsg: Sends a streaming message from a processor. Used when the payload is too large for a single message.
349
+
* `RPC.identify`: Indicates that the runner is ready and provides their unique identifier (IRI).
350
+
* `RPC.init`: Confirms that a previously registered **processor** has successfully started.
351
+
* `RPC.close`: Notifies that the runner is shutting down because all processors have stopped.
352
+
* `RPC.msg`: Sends a message from a processor to another processor via the orchestrator.
353
+
* `RPC.streamMsg`: Sends a streaming message from a processor. Used when the payload is too large for a single message.
354
354
355
355
356
356
## Orchestrator
@@ -391,7 +391,7 @@ Each pair includes:
391
391
<div class="example" title="Pipeline definition">
392
392
393
393
This example pipeline contains three processors divided over two runners.
394
-
The orchestrator starts for this pipeline, two runners and provide them with the correct processor configurations.
394
+
The orchestrator starts this pipeline with two runners and provides them with the correct processor configurations.
@@ -415,7 +415,7 @@ Each simple runner should specify two things:
415
415
- how the runner should be started, linked with `rdfc:command`
416
416
417
417
<aside class="note">
418
-
The property `rdfc:handlesSubjectsOf` plays a somewhat unconventional role. It allows RDFConnect to remain aligned with [PROV-O](https://www.w3.org/TR/prov-o/).
418
+
The property `rdfc:handlesSubjectsOf` plays a somewhat unconventional role. It allows RDF-Connect to remain aligned with [PROV-O](https://www.w3.org/TR/prov-o/).
419
419
A runner may link to a predicate such as `rdfc:jsImplementationOf`, and each JavaScript processor may then declare itself using:
420
420
```turtle
421
421
<FooBar> rdfc:jsImplementationOf rdfc:Processor.
@@ -449,7 +449,7 @@ Once all expected runners have successfully identified themselves, the orchestra
449
449
450
450
#### Starting processors
451
451
452
-
Once all runners have been successfully identified via Rpc.identify, the orchestrator proceeds to initialize the processors defined in the pipeline.
452
+
Once all runners have been successfully identified via `RPC.identify`, the orchestrator proceeds to initialize the processors defined in the pipeline.
453
453
This involves the extraction and transformation of processor configuration data into a format suitable for consumption by the associated runner.
454
454
455
455
**Processor Arguments**
@@ -516,11 +516,11 @@ Processor definitions are extracted using the same SHACL-based mechanism describ
516
516
517
517
**RPC message**
518
518
519
-
Once both the arguments and definition have been extracted for a processor instance, the orchestrator sends an `Rpc.proc` message to the appropriate runner, initiating the processor launch process.
519
+
Once both the arguments and definition have been extracted for a processor instance, the orchestrator sends an `RPC.proc` message to the appropriate runner, initiating the processor launch process.
520
520
521
521
The orchestrator MUST keep an internal record of all processor instance that have been dispatched to a runner, and the runner’s acknowledgment that a processor was successfully launched, as indicated by an incoming `Rpc.init` message.
522
522
523
-
No processor may be assumed to be operational until its runner has responded with `Rpc.init`.
523
+
No processor may be assumed to be operational until its runner has responded with `RPC.init`.
524
524
When all processors are successfully initialized, the orchestrator can start the pipeline.
When a runner sends a `Rpc.msg` message to the orchestrator, the message includes a channel IRI indicating its logical destination.
546
+
When a runner sends a `RPC.msg` message to the orchestrator, the message includes a channel IRI indicating its logical destination.
547
547
548
548
The orchestrator MUST:
549
549
1. Resolve the set of processors that are declared to consume this channel.
550
550
2. Determine which runner is responsible for each of those processors.
551
-
3. Forward the message to each relevant runner using `Rpc.msg`.
551
+
3. Forward the message to each relevant runner using `RPC.msg`.
552
552
553
553
These messages are sent as discrete units and fit within the allowed message size.
554
554
@@ -565,8 +565,8 @@ This protocol enables large messages to be sent incrementally over a separate gR
565
565
The process is as follows:
566
566
1. **Sender** (runner) initiates a `sendStreamMessage` gRPC stream to the orchestrator.
567
567
2. The **orchestrator** generates a unique stream identifier and sends it back on this stream.
568
-
3. The **sender** then sends a `Rpc.streamMsg` over the normal bidirectional RPC stream, including the stream identifier and the channel IRI.
569
-
4. The **orchestrator** resolves which processors receive messages on the given channel and forwards the stream identifier to their corresponding runners with `Rpc.streamMsg`.
568
+
3. The **sender** then sends a `RPC.streamMsg` over the normal bidirectional RPC stream, including the stream identifier and the channel IRI.
569
+
4. The **orchestrator** resolves which processors receive messages on the given channel and forwards the stream identifier to their corresponding runners with `RPC.streamMsg`.
570
570
5. **Receiving runners** connect to the orchestrator using `receiveStreamMessage`, passing the received stream identifier.
571
571
6. Once all participants are connected, the orchestrator acts as a relay: all incoming chunks from the sending stream are forwarded to each connected receiving stream.
572
572
@@ -587,7 +587,7 @@ While minimal runners can be implemented with little overhead, they often become
587
587
588
588
These quality-of-life features include:
589
589
590
-
* Wrapping readers and writer in idiomatic objects
590
+
* Wrapping readers and writers in idiomatic objects
591
591
* Runners SHOULD also make it possible to let processors start up before acknowledging to the orchestrator that the processor is initialized.
592
592
593
593
This is useful for processor to execute longer running operations, like reading a file or consulting an external API.
@@ -613,12 +613,12 @@ The following sections detail the runner startup flow and describe the expected
613
613
614
614
A runner is defined with an instance of `rdfc:Runner`, which currently is instantiated by a command.
615
615
It requires the command to actually start the runner with `rdfc:command`, and a link to the programming context (`rdfc:handlesSubjectsOf`).
616
-
The object of `rdfc:handlesSubjectsOf`, links runners and processors to a context term. This often refers to the programming language.
616
+
The object of `rdfc:handlesSubjectsOf` links runners and processors to a context term. This often refers to the programming language.
617
617
618
618
Each context term is related to a SHACL shape, this specifies the incoming data that the runner can use to start the processors.
619
619
620
620
<div class=example>
621
-
For example, the NodeRunner, a runner that starts JavaScript processors with node, is defined as follows.
621
+
The NodeRunner, for example, is a program that starts JavaScript processors with Node. It is defined as follows:
622
622
623
623
```turtle
624
624
rdfc:NodeRunner a rdfc:Runner;
@@ -676,10 +676,10 @@ After initialization, the orchestrator may send multiple `RPC.proc` messages to
676
676
Each message includes a processor IRI, a configuration object and an argument object.
677
677
Both the configuration and arguments are provided as JSON-LD strings.
678
678
The configuration object contains the arguments as defined by the context term following the section [Mapping SHACL to Configuration Structures](#mapping-shacl-to-configuration-structures).
679
-
The arguments are constructed based on a SHACL shape defined for the processor type
679
+
The arguments are constructed based on a SHACL shape defined for the processor type.
680
680
681
-
Runners are MAY to parse these JSON-LD payloads and transform known constructs into idiomatic equivalents.
682
-
For example reader and writer objects are represented as JSON-LD values with: an @id field (containing the channel IRI), and an @type field indicating either `https://w3id.org/rdf-connect/ontology#Reader` or `https://w3id.org/rdf-connect/ontology#Writer`.
681
+
Runners MAY parse these JSON-LD payloads and transform known constructs into idiomatic equivalents.
682
+
For example, reader and writer objects are represented as JSON-LD values with: an `@id` field (containing the channel IRI), and an `@type` field indicating either `https://w3id.org/rdf-connect/ontology#Reader` or `https://w3id.org/rdf-connect/ontology#Writer`.
683
683
Runners are RECOMMENDED to replace these values with appropriate typed objects in the target environment.
684
684
685
685
When a processor has been fully initialized, the runner MUST send an `RPC.init` message, indicating success or failure.
@@ -714,7 +714,7 @@ If the runner determines that the message size is small enough, it MAY convert a
714
714
715
715
#### Sending messages
716
716
717
-
Runners must also support outbound communication from processors.
717
+
Runners MUST also support outbound communication from processors.
718
718
While runners MAY omit support for certain advanced features (such as streaming output), a full implementation is strongly encouraged.
719
719
720
720
To send a normal message, the runner uses the `RPC.msg` method on the normal stream.
@@ -728,10 +728,10 @@ The message is considered complete when the runner closes the stream.
728
728
729
729
#### Channel Closure
730
730
731
-
Processors may indicate that a given channel is closed (i.e., no further messages will be sent).
732
-
The runner MUST propagate this information to the orchestrator via an RPC.close message.
731
+
Processors MAY indicate that a given channel is closed (i.e., no further messages will be sent).
732
+
The runner MUST propagate this information to the orchestrator via an `RPC.close` message.
733
733
734
-
Similarly, when the orchestrator sends an RPC.close message for a channel, the runner MAY respond by closing or invalidating the corresponding data stream in the processor.
734
+
Similarly, when the orchestrator sends an `RPC.close` message for a channel, the runner MAY respond by closing or invalidating the corresponding data stream in the processor.
0 commit comments