Skip to content

Commit 36c7123

Browse files
jmacdlquerelutpillaalbertlockett
authored
Phase 2 work-in-progress documentation update (open-telemetry#1263)
This updates a number of key README files to explain and help navigate our new Rust-based OTAP Dataflow pipeline. --------- Co-authored-by: Laurent Quérel <[email protected]> Co-authored-by: Utkarsh Umesan Pillai <[email protected]> Co-authored-by: albertlockett <[email protected]>
1 parent 2b7c754 commit 36c7123

File tree

6 files changed

+494
-84
lines changed

6 files changed

+494
-84
lines changed

CONTRIBUTING.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,16 @@ To work with this repository, you'll need:
2727

2828
## Local Run/Build
2929

30+
Initialize Git submodules so that the OpenTelemetry protocol references
31+
used in building from `.proto` definitions can succeed:
32+
33+
```bash
34+
git submodule update --init --recursive
35+
```
36+
37+
When successful, you will find the directory `proto/opentelemetry-proto/`
38+
populated with the OpenTelemetry protocol definition used in this repository.
39+
3040
### How to set up and run a local OTel-Arrow collector
3141

3242
See [collector/README.md](./collector/README.md) for instructions on running the

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,8 +84,8 @@ still fundamentally row-oriented.
8484

8585
We are building an end-to-end OpenTelemetry Protocol with Apache Arrow
8686
(OTAP) pipeline and we believe this form of pipeline will have substantially
87-
lower overhead than a row-oriented architecture. [See our Phase 2 design
88-
document](./docs/phase2-design.md).
87+
lower overhead than a row-oriented architecture. [See our Phase 2 OTAP
88+
Dataflow engine documentation](./rust/otap-dataflow/README.md).
8989

9090
These are our future milestones for OpenTelemetry and Apache Arrow
9191
integration:

go/README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# OTel-Arrow Go libraries
2+
3+
This folder contains the OTel-Arrow Go reference implementation. This
4+
implementation was built around the OpenTelemetry Collector in Golang,
5+
therefore it targets the `pdata` representation used in that system's
6+
pipeline, with top-level types known as `ptrace.Traces`,
7+
`pmetric.Metrics`, and `plog.Logs` corresponding with the payload of
8+
an OpenTelemetry Traces, Metrics, or Logs export request.
9+
10+
The primary use for this library involves converting between two
11+
primary representations:
12+
13+
- OTLP records: the Collector's in-memory data representation
14+
- OTAP stream: the OTel-Arrow batch of Arrow IPC stream records
15+
16+
The intermediate representation between the OTLP records and OTAP
17+
stream forms, known as "OTAP records", exists here, however its design
18+
was not emphasized. Refer to the
19+
[Otel-Arrow-Rust](../rust/otel-arrow-rust/README.md) reference
20+
implementation for more details about handling the OTAP records format
21+
in memory.
22+
23+
## OpenTelemetry Collector Producer to OTAP stream
24+
25+
This library produces the OTel-Arrow OTAP stream representation of
26+
OpenTelemetry data from the standard representation, for use in
27+
OpenTelemetry Collector pipelines. [The `otelarrowexporter` component
28+
in the OpenTelemetry Collector-Contrib repository][OTELARROWEXPORTER]
29+
is the primary user of this feature, which first converts PData to
30+
OTAP records, then to OTAP bytes using an Arrow IPC writer.
31+
32+
The main Producer entry point for converting from OpenTelemetry
33+
Collector records into OTAP streams is found in
34+
[./pkg/otel/arrow_record/producer.go](./pkg/otel/arrow_record/producer.go)
35+
36+
[OTELARROWEXPORTER]: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/otelarrowexporter/README.md
37+
38+
## OTAP stream to OpenTelemetry Collector Consumer
39+
40+
This library consumes the OTel-Arrow OTAP stream representation of
41+
OpenTelemetry data and produces the standard representation, for use
42+
in OpenTelemetry Collector pipelines. [The `otelarrowreceiver`
43+
component in the OpenTelemetry Collector-Contrib
44+
repository][OTELARROWRECEIVER] is the primary user of this feature,
45+
which first converts OTAP bytes into OTAP records using an Arrow IPC
46+
reader, then converts the records back into the standard
47+
representation.
48+
49+
The main Consumer entry point for converting from OTAP streams into
50+
OpenTelemetry Collector records is found in
51+
[./pkg/otel/arrow_record/consumer.go](./pkg/otel/arrow_record/consumer.go)
52+
53+
[OTELARROWRECEIVER]: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/otelarrowreceiver/README.md

rust/README.md

Lines changed: 67 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,71 @@
1-
# Rust components
1+
# OTel-Arrow Rust libraries
22

3-
This folder contains various Rust projects of varying stages of maturity.
3+
This folder contains the OTel-Arrow Rust sub-projects listed below.
44

5-
The `otap-dataflow/` folder contains the main deliverable of Phase 2 of the
6-
otel-arrow project, [as mentioned in its README](./otap-dataflow/README.md).
5+
## OTAP Dataflow
76

8-
All other folders are either experimental or initial donations of components
9-
that have yet to be incorporated into the main library.
7+
**[Sub-project README](./otap-dataflow/README.md)**
108

11-
| Folder | Type |
12-
|-----------------|------------------------------|
13-
| beaubourg | :handshake: Contributed Code |
14-
| experimental | :mag: Prototype |
15-
| otap-dataflow | :hammer: Core Component |
16-
| otel-arrow-rust | :handshake: Contributed Code |
9+
The `otap-dataflow` folder contains the project's primary dataflow
10+
engine for building OpenTelemetry pipelines with an Arrow-first
11+
approach. This component supports building and running the engine as a
12+
software library, suitable for embedding in other telemetry agents.
13+
14+
This crate includes a CLI tool named `df_engine` for test and
15+
demonstration purposes including a set of core components. In this
16+
form, the engine is configured with YAML configuration expression the
17+
set of nodes and edges in the graph. The core components: OTLP
18+
receiver and exporter, OTAP receiver and exporter, batch and retry
19+
processors, debug processor, fake data generator, Parquet exporter,
20+
and a few more.
21+
22+
The primary data type of the OTAP dataflow engine is OTAP records
23+
format, consisting of a set of Arrow record batches corresponding with
24+
elements in the OpenTelemetry data model, by signal. The OTAP pipeline
25+
also supports passing through OTLP bytes as literal data, with
26+
**direct conversion** between the OTAP records and OTLP bytes models.
27+
28+
## OTel-Arrow Rust
29+
30+
**[Sub-project README](./otel-arrow-rust/README.md)**
31+
32+
The `otel-arrow-rust` folder contains the project's Rust reference
33+
implementation for OTel-Arrow, similar in nature to the [OTel-Arrow
34+
Golang library](../go/README.md) used by the project's Golang
35+
collector components. This library translates between the following
36+
representations of OpenTelemetry:
37+
38+
- OTAP records: represented using [Apache Arrow (arrow-rs)][ARROW_RS]
39+
record batches
40+
- OTLP records: represented using [Prost][PROST_RS] message objects
41+
- OTAP stream: represented as batches of [Arrow IPC][ARROW_IPC] stream
42+
- OTLP bytes: represented as bytes of [OpenTelemetry Protocol
43+
(OTLP)][OTLP] data
44+
45+
[ARROW_RS]: https://github.com/apache/arrow-rs/blob/main/README.md
46+
[PROST_RS]: https://github.com/tokio-rs/prost/blob/master/README.md
47+
[ARROW_IPC]: https://arrow.apache.org/docs/format/IPC.html
48+
[OTLP]: https://opentelemetry.io/docs/specs/otel/protocol/
49+
50+
This library a low-level interface for producing and consuming OTAP
51+
records. This library includes built-in support for batching and
52+
splitting of OTAP records. While this library is recommended any time
53+
you are converting between the representations listed above, note that
54+
the OTAP Dataflow engine includes an alternative that avoids
55+
materializing intermediate OTLP records. We recommend [PData
56+
Views](./otap-dataflow/crates/pdata-views/README.md) for producing and
57+
consuming OTLP bytes in the OTAP-Dataflow engine.
58+
59+
## Experimental
60+
61+
Here, find our experimental projects. As part of the OTel-Arrow Phase
62+
2 project scope ([project-phases](../docs/project-phases.md)), we are
63+
developing transform and filter capabilities based around the OTAP
64+
records representation.
65+
66+
- [Query abstraction: intermediate representation for common OTTL and
67+
KQL phrases](./experimental/query_abstraction/README.md)
68+
- [Query engine: reference implementation for the abstraction
69+
layer](./experimental/query_engine/README.md)
70+
- [Parquet query examples: querying OTel-Arrow data in Parquet
71+
files using DataFusion](./parquet_query_examples/README.md)

0 commit comments

Comments
 (0)