From ed14d07c252c440933b11e54f6ab86615ec8742d Mon Sep 17 00:00:00 2001 From: Cijo Thomas Date: Wed, 12 Feb 2025 18:22:01 -0800 Subject: [PATCH 1/5] Add design docs --- docs/design/logs.md | 225 +++++++++++++++++++++++++++++++++++++++++ docs/design/metrics.md | 6 ++ docs/design/traces.md | 6 ++ 3 files changed, 237 insertions(+) create mode 100644 docs/design/logs.md create mode 100644 docs/design/metrics.md create mode 100644 docs/design/traces.md diff --git a/docs/design/logs.md b/docs/design/logs.md new file mode 100644 index 0000000000..8f373198b5 --- /dev/null +++ b/docs/design/logs.md @@ -0,0 +1,225 @@ +# OpenTelemetry Rust Logs Design + +Status: +[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md) + +## Overview + +OpenTelemetry (OTel) Logs support differs from Metrics and Traces as it does not +introduce a new logging API for end users. Instead, OTel recommends leveraging +existing logging libraries such as `log` and `tracing`, while providing bridges +(appenders) to route logs through OpenTelemetry. + +Unlike Traces and Metrics, which introduced new APIs, Logs took a different +approach due to the long history of existing logging solutions. In Rust, the +most widely used logging libraries are `log` and `tracing`. OTel Rust maintains +appenders for these libraries, allowing users to seamlessly integrate with +OpenTelemetry without changing their existing logging instrumentation. + +The `tracing` appender is particularly optimized for performance due to its +widespread adoption and the fact that `tracing` itself has a bridge from the +`log` crate. Notably, OpenTelemetry Rust itself is instrumented using `tracing` +for internal logs. Additionally, when OTel began supporting logging as a signal, +the `log` crate lacked structured logging support, reinforcing the decision to +prioritize `tracing`. + +## Benefits of OpenTelemetry Logs + +- **Unified configuration** across Traces, Metrics, and Logs. +- **Automatic correlation** with Traces. +- **Consistent Resource attributes** across signals. +- **Multiple destinations support**: Logs can continue flowing to existing + destinations like stdout while also being sent to an OpenTelemetry-capable + backend, typically via an OTLP Exporter or exporters that export to operating + system native systems like `Windows ETW` or `Linux user_events`. +- **Standalone logging support** for applications that use OpenTelemetry as + their primary logging mechanism. + +## Key Design Principles + +- High performance - no locks/contention in the hot path, minimal/no heap + allocation. +- Capped resource usage - well-defined behavior when overloaded. +- Self-observable. +- Well defined Error handling, returning Result as appropriate instead of panic. +- Minimal public API, exposing based on need only. + +## Logs API + +The OTel Logs API is not intended for direct end-user usage. Instead, it is +designed for appender/bridge authors to integrate existing logging libraries +with OpenTelemetry. However, there is nothing preventing it from being used by +end-users. + +### API Components + +1. **Key-Value Structs**: Used in `LogRecord`, where keys are shared across + signals but values differ from Metrics and Traces. This is because values in + Logs can contain more complex structures than those in Traces and Metrics. +2. **Traits**: + - `LoggerProvider` - provides methods to obtain Logger. + - `Logger` - provides methods to create LogRecord and emit the created + LogRecord. + - `LogRecord` - provides methods to populate LogRecord. +3. **No-Op Implementations**: By default, the API performs no operations until + an SDK is attached. + +### Logs Flow + +1. Obtain a `LoggerProvider` implementation. +2. Use the `LoggerProvider` to create `Logger` instances, specifying a scope + name (module/component emitting logs). Optional attributes and version are + also supported. +3. Use the `Logger` to create an empty `LogRecord` instance. +4. Populate the `LogRecord` with body, timestamp, attributes, etc. +5. Call `Logger.emit(LogRecord)` to process and export the log. + +If only the Logs API is used (without an SDK), all the above steps result in no +operations, following OpenTelemetry’s philosophy of separating API from SDK. The +official Logs SDK provides real implementations to process and export logs. +Users or vendors can also provide alternative SDK implementations. + +## Logs SDK + +The OpenTelemetry Logs SDK provides an OTel specification-compliant +implementation of the Logs API, handling log processing and export. + +### Core Components + +#### `SdkLoggerProvider` + +- Implements the `LoggerProvider` trait. +- Creates and manages `SdkLogger` instances. +- Holds logging configuration, including `Resource` and processors. +- Does not retain a list of created loggers. Instead, it passes an owned clone + of itself to each logger created. This is done so that loggers get a hold of + the configuration (like which processor to invoke). +- Uses an `Arc` and delegates all configuration to + `LoggerProviderInner`. This allows cheap cloning of itself and ensures all + clones point to the same underlying configuration. +- As `SdkLoggerProvider` only holds an `Arc` of its inner, it can only accept + `&self` in its methods like flush and shutdown. Else it needs to rely on + interior mutability that comes with runtime performance costs. Since methods + like shutdown usually need to mutate interior state, components like exporter + use interior mutability to handle shutdown. (More on this in the exporter + section) +- `LoggerProviderInner` implements `Drop`, triggering `shutdown()` when no + references remain. However, in practice, loggers are often stored statically + inside appenders (like tracing-appender), so explicit shutdown by the user is + required. + +#### `SdkLogger` + +- Implements the `Logger` trait. +- Creates `SdkLogRecord` instances and emits them. +- Calls `OnEmit()` on all registered processors when emitting logs. +- Passes mutable references to each processor (`&mut log_record`), i.e., + ownership is not passed to the processor. This ensures that the logger avoids + cloning costs. Since a mutable reference is passed, processors can modify the + log, and it will be visible to the next processor in the chain. +- Since the processor only gets a reference to the log, it cannot store it + beyond the `OnEmit()`. If a processor needs to buffer logs, it must explicitly + copy them to the heap. +- This design allows for stack-only log processing when exporting to operating + system native facilities like `Windows ETW` or `Linux user_events`. +- OTLP Exporting requires network calls (HTTP/gRPC) and batching of logs for + efficiency purposes. These exporters buffer log records by copying them to the + heap. (More on this in the BatchLogRecordProcessor section) + +#### `LogRecord` + +- Holds log data, including attributes. +- Uses an inline array for up to 5 attributes to optimize stack usage. +- Falls back to a heap-allocated `Vec` if more attributes are required. +- Inspired by Go’s `slog` library for efficiency. + +#### LogRecord Processors + +`SdkLoggerProvider` allows being configured with any number of LogProcessors. +They get called in the order of registration. Log records are passed to the +`OnEmit` method of LogProcessor. LogProcessors can be used to process the log +records, enrich them, filter them, and export to destinations by leveraging +LogRecord Exporters. + +Following built-in Log processors are provided in the Log SDK: + +##### SimpleLogProcessor + +This processor is designed to be used for exporting purposes. Export is handled +by an Exporter (which is a separate component). SimpleLogProcessor is "simple" +in the sense that it does not attempt to do any processing - it just calls the +exporter and passes the log record to it. To comply with OTel specification, it +synchronizes calls to the `Export()` method, i.e., only one `Export()` call will +be done at any given time. + +SimpleLogProcessor is only used for test/learning purposes and is often used +along with a `stdout` exporter. + +##### BatchLogProcessor + +This is another "exporting" processor. As with SimpleLogProcessor, a different +component named LogExporter handles the actual export logic. BatchLogProcessor +buffers/batches the logs it receives into an in-memory buffer. It invokes the +exporter every 1 second or when 512 items are in the batch (customizable). It +uses a background thread to do the export, and communication between the user +thread (where logs are emitted) and the background thread occurs with `mpsc` +channels. + +The max amount of items the buffer holds is 2048 (customizable). Once the limit +is reached, any *new* logs are dropped. It *does not* apply back-pressure to the +user thread and instead drops logs. + +As with SimpleLogProcessor, this component also ensures only one export is +active at a given time. A modified version of this is required to achieve higher +throughput in some environments. + +In this design, at most 2048+512 logs can be in memory at any given point. In +other words, that many logs can be lost if the app crashes in the middle. + +## LogExporters + +LogExporters are responsible for exporting logs to a destination. Some of them +include: + +1. **InMemoryExporter** - exports to an in-memory list, primarily for + unit-testing. This is used extensively in the repo itself, and external users + are also encouraged to use this. +2. **Stdout exporter** - prints telemetry to stdout. Only for debugging/learning + purposes. The output format is not defined and also is not performance + optimized. A production-recommended version with a standardized output format + is in the plan. +3. **OTLP Exporter** - OTel's official exporter which uses the OTLP protocol + that is designed with the OTel data model in mind. Both HTTP and gRPC-based + exporting is offered. +4. **Exporters to OS Kernel facilities** - These exporters are not maintained in + the core repo but listed for completion. They export telemetry to Windows ETW + or Linux user_events. They are designed for high-performance workloads. Due + to their nature of synchronous exporting, they do not require + buffering/batching. This allows logs to operate entirely on the stack and can + scale easily with the number of CPU cores. (Kernel uses per-CPU buffers for + the events, ensuring no contention) + +## `tracing` Log Appender + +The `tracing` appender bridges `tracing` logs to OpenTelemetry. Logs emitted via +`tracing` macros (`info!`, `warn!`, etc.) are forwarded to OpenTelemetry through +this integration. + +- `tracing` is designed for high performance, using *layers* or *subscribers* to + handle emitted logs (events). +- The appender implements a `Layer`, receiving logs from `tracing`. +- Uses the OTel Logs API to create `LogRecord`, populate it, and emit it via + `Logger.emit(LogRecord)`. +- If no Logs SDK is present, the process is a no-op. + +## Summary + +- OpenTelemetry Logs does not provide a user-facing logging API. +- Instead, it integrates with existing logging libraries (`log`, `tracing`). +- The Logs API defines key traits but performs no operations unless an SDK is + installed. +- The Logs SDK enables log processing, transformation, and export. +- The Logs SDK is performance optimized to minimize copying and heap allocation, + wherever feasible. +- The `tracing` appender efficiently routes logs to OpenTelemetry without + modifying existing logging workflows. diff --git a/docs/design/metrics.md b/docs/design/metrics.md new file mode 100644 index 0000000000..18660ccea1 --- /dev/null +++ b/docs/design/metrics.md @@ -0,0 +1,6 @@ +# OpenTelemetry Rust Metrics Design + +Status: +[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md) + +TODO: diff --git a/docs/design/traces.md b/docs/design/traces.md new file mode 100644 index 0000000000..6311a73dcc --- /dev/null +++ b/docs/design/traces.md @@ -0,0 +1,6 @@ +# OpenTelemetry Rust Traces Design + +Status: +[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md) + +TODO: From fc275c1a8f9aa11b21ec55809dc9a8cf5b4d73fa Mon Sep 17 00:00:00 2001 From: Cijo Thomas Date: Wed, 12 Feb 2025 18:31:47 -0800 Subject: [PATCH 2/5] more --- docs/design/logs.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/docs/design/logs.md b/docs/design/logs.md index 8f373198b5..dda25d269d 100644 --- a/docs/design/logs.md +++ b/docs/design/logs.md @@ -103,6 +103,9 @@ implementation of the Logs API, handling log processing and export. like shutdown usually need to mutate interior state, components like exporter use interior mutability to handle shutdown. (More on this in the exporter section) +- An alternative design was to let `SdkLogger` hold a `Weak` reference to the + `SdkLoggerProvider`. This would be a `weak->arc` upgrade in every log emission, + significantly affecting throughput. - `LoggerProviderInner` implements `Drop`, triggering `shutdown()` when no references remain. However, in practice, loggers are often stored statically inside appenders (like tracing-appender), so explicit shutdown by the user is @@ -212,6 +215,18 @@ this integration. `Logger.emit(LogRecord)`. - If no Logs SDK is present, the process is a no-op. +## Performance + +// Call out things done specifically for performance + +### Perf test - benchmarks + +// Share ~~ numbers + +### Perf test - stress test + +// Share ~~ numbers + ## Summary - OpenTelemetry Logs does not provide a user-facing logging API. From 0cdf9c78d14977e20da7bb6fba34aa474b62c044 Mon Sep 17 00:00:00 2001 From: Cijo Thomas Date: Thu, 13 Feb 2025 09:34:12 -0800 Subject: [PATCH 3/5] address feedback and add mermaid --- docs/design/logs.md | 112 ++++++++++++++++++++++++++++++++------------ 1 file changed, 82 insertions(+), 30 deletions(-) diff --git a/docs/design/logs.md b/docs/design/logs.md index dda25d269d..9a44ebbc69 100644 --- a/docs/design/logs.md +++ b/docs/design/logs.md @@ -5,16 +5,20 @@ Status: ## Overview -OpenTelemetry (OTel) Logs support differs from Metrics and Traces as it does not -introduce a new logging API for end users. Instead, OTel recommends leveraging -existing logging libraries such as `log` and `tracing`, while providing bridges -(appenders) to route logs through OpenTelemetry. - -Unlike Traces and Metrics, which introduced new APIs, Logs took a different -approach due to the long history of existing logging solutions. In Rust, the -most widely used logging libraries are `log` and `tracing`. OTel Rust maintains -appenders for these libraries, allowing users to seamlessly integrate with -OpenTelemetry without changing their existing logging instrumentation. +[OpenTelemetry (OTel) +Logs](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/README.md) +support differs from Metrics and Traces as it does not introduce a new logging +API for end users. Instead, OTel recommends leveraging existing logging +libraries such as [log](https://crates.io/crates/log) and +[tracing](https://crates.io/crates/tracing), while providing bridges (appenders) +to route logs through OpenTelemetry. + +OTel took this different approach due to the long history of existing logging +solutions. In Rust, these are [log](https://crates.io/crates/log) and +[tracing](https://crates.io/crates/tracing), and have been embraced in the +community for some time. OTel Rust maintains appenders for these libraries, +allowing users to seamlessly integrate with OpenTelemetry without changing their +existing logging instrumentation. The `tracing` appender is particularly optimized for performance due to its widespread adoption and the fact that `tracing` itself has a bridge from the @@ -29,21 +33,55 @@ prioritize `tracing`. - **Automatic correlation** with Traces. - **Consistent Resource attributes** across signals. - **Multiple destinations support**: Logs can continue flowing to existing - destinations like stdout while also being sent to an OpenTelemetry-capable - backend, typically via an OTLP Exporter or exporters that export to operating - system native systems like `Windows ETW` or `Linux user_events`. + destinations like stdout etc. while also being sent to an + OpenTelemetry-capable backend, typically via an OTLP Exporter or exporters + that export to operating system native systems like `Windows ETW` or `Linux + user_events`. - **Standalone logging support** for applications that use OpenTelemetry as their primary logging mechanism. ## Key Design Principles -- High performance - no locks/contention in the hot path, minimal/no heap - allocation. -- Capped resource usage - well-defined behavior when overloaded. -- Self-observable. -- Well defined Error handling, returning Result as appropriate instead of panic. +- High performance - no locks/contention in the hot path with minimal/no heap + allocation where possible. +- Capped resource (memory) usage - well-defined behavior when overloaded. +- Self-observable - exposes telemetry about itself to aid in troubleshooting + etc. +- Robust error handling, returning Result where possible instead of panicking. - Minimal public API, exposing based on need only. +## Architecture Overview + +```mermaid +graph TD + subgraph Application + A1[Application Code] + end + subgraph Logging Libraries + B1[log crate] + B2[tracing crate] + end + subgraph OpenTelemetry + C1[OpenTelemetry Appender for log] + C2[OpenTelemetry Appender for tracing] + C3[OpenTelemetry Logs API] + C4[OpenTelemetry Logs SDK] + C5[OTLP Exporter] + end + subgraph Observability Backend + D1[OTLP-Compatible Backend] + end + A1 --> |Emits Logs| B1 + A1 --> |Emits Logs| B2 + B1 --> |Bridged by| C1 + B2 --> |Bridged by| C2 + C1 --> |Sends to| C3 + C2 --> |Sends to| C3 + C3 --> |Processes with| C4 + C4 --> |Exports via| C5 + C5 --> |Sends to| D1 +``` + ## Logs API The OTel Logs API is not intended for direct end-user usage. Instead, it is @@ -53,9 +91,10 @@ end-users. ### API Components -1. **Key-Value Structs**: Used in `LogRecord`, where keys are shared across - signals but values differ from Metrics and Traces. This is because values in - Logs can contain more complex structures than those in Traces and Metrics. +1. **Key-Value Structs**: Used in `LogRecord`, where `Key` struct is shared + across signals but `Value` struct differ from Metrics and Traces. This is + because values in Logs can contain more complex structures than those in + Traces and Metrics. 2. **Traits**: - `LoggerProvider` - provides methods to obtain Logger. - `Logger` - provides methods to create LogRecord and emit the created @@ -88,6 +127,9 @@ implementation of the Logs API, handling log processing and export. #### `SdkLoggerProvider` +This is the implementation of the `LoggerProvider` and deals with concerns such +as processing and exporting Logs. + - Implements the `LoggerProvider` trait. - Creates and manages `SdkLogger` instances. - Holds logging configuration, including `Resource` and processors. @@ -97,15 +139,15 @@ implementation of the Logs API, handling log processing and export. - Uses an `Arc` and delegates all configuration to `LoggerProviderInner`. This allows cheap cloning of itself and ensures all clones point to the same underlying configuration. -- As `SdkLoggerProvider` only holds an `Arc` of its inner, it can only accept +- As `SdkLoggerProvider` only holds an `Arc` of its inner, it can only take `&self` in its methods like flush and shutdown. Else it needs to rely on interior mutability that comes with runtime performance costs. Since methods - like shutdown usually need to mutate interior state, components like exporter - use interior mutability to handle shutdown. (More on this in the exporter - section) + like shutdown usually need to mutate interior state, but this component can + only take `&self`, it defers to components like exporter to use interior + mutability to handle shutdown. (More on this in the exporter section) - An alternative design was to let `SdkLogger` hold a `Weak` reference to the - `SdkLoggerProvider`. This would be a `weak->arc` upgrade in every log emission, - significantly affecting throughput. + `SdkLoggerProvider`. This would be a `weak->arc` upgrade in every log + emission, significantly affecting throughput. - `LoggerProviderInner` implements `Drop`, triggering `shutdown()` when no references remain. However, in practice, loggers are often stored statically inside appenders (like tracing-appender), so explicit shutdown by the user is @@ -113,6 +155,9 @@ implementation of the Logs API, handling log processing and export. #### `SdkLogger` +This is an implementation of the `Logger`, and contains functionality to create +and emit logs. + - Implements the `Logger` trait. - Creates `SdkLogRecord` instances and emits them. - Calls `OnEmit()` on all registered processors when emitting logs. @@ -204,9 +249,9 @@ include: ## `tracing` Log Appender -The `tracing` appender bridges `tracing` logs to OpenTelemetry. Logs emitted via -`tracing` macros (`info!`, `warn!`, etc.) are forwarded to OpenTelemetry through -this integration. +The `tracing` appender bridges `tracing` logs (events) to OpenTelemetry. Logs +emitted via `tracing` macros (`info!`, `warn!`, etc.) are forwarded to +OpenTelemetry through this integration. - `tracing` is designed for high performance, using *layers* or *subscribers* to handle emitted logs (events). @@ -215,6 +260,13 @@ this integration. `Logger.emit(LogRecord)`. - If no Logs SDK is present, the process is a no-op. +Note on terminology: Within OpenTelemetry, "tracing" refers to distributed +tracing (i.e creation of Spans) and not in-process structured logging and +execution traces. The crate "tracing" has notion of creating Spans as well as +Events. The events from "tracing" crate is what gets converted to OTel Logs, +when using this appender. Spans created using "tracing" crate is not handled by +this crate. + ## Performance // Call out things done specifically for performance From 8487495dcb6f683a93caff9fb3104f9af43d4ec0 Mon Sep 17 00:00:00 2001 From: Cijo Thomas Date: Thu, 13 Feb 2025 09:34:12 -0800 Subject: [PATCH 4/5] add link to crates --- docs/design/logs.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/docs/design/logs.md b/docs/design/logs.md index 9a44ebbc69..9aa18f6bc8 100644 --- a/docs/design/logs.md +++ b/docs/design/logs.md @@ -84,6 +84,9 @@ graph TD ## Logs API +Logs API is part of the [opentelemetry](https://crates.io/crates/opentelemetry) +crate. + The OTel Logs API is not intended for direct end-user usage. Instead, it is designed for appender/bridge authors to integrate existing logging libraries with OpenTelemetry. However, there is nothing preventing it from being used by @@ -120,6 +123,9 @@ Users or vendors can also provide alternative SDK implementations. ## Logs SDK +Logs SDK is part of the +[opentelemetry_sdk](https://crates.io/crates/opentelemetry_sdk) crate. + The OpenTelemetry Logs SDK provides an OTel specification-compliant implementation of the Logs API, handling log processing and export. @@ -249,9 +255,9 @@ include: ## `tracing` Log Appender -The `tracing` appender bridges `tracing` logs (events) to OpenTelemetry. Logs -emitted via `tracing` macros (`info!`, `warn!`, etc.) are forwarded to -OpenTelemetry through this integration. +The `tracing` appender bridges `tracing` logs to OpenTelemetry. Logs emitted via +`tracing` macros (`info!`, `warn!`, etc.) are forwarded to OpenTelemetry through +this integration. - `tracing` is designed for high performance, using *layers* or *subscribers* to handle emitted logs (events). From 90553773161b0d31e4543a38f120f7724fab346e Mon Sep 17 00:00:00 2001 From: Cijo Thomas Date: Thu, 13 Feb 2025 09:37:03 -0800 Subject: [PATCH 5/5] fix link --- docs/design/logs.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/design/logs.md b/docs/design/logs.md index 9aa18f6bc8..e6e78c1e7f 100644 --- a/docs/design/logs.md +++ b/docs/design/logs.md @@ -255,6 +255,10 @@ include: ## `tracing` Log Appender +Tracing appender is part of the +[opentelemetry-appender-tracing](https://crates.io/crates/opentelemetry-appender-tracing) +crate. + The `tracing` appender bridges `tracing` logs to OpenTelemetry. Logs emitted via `tracing` macros (`info!`, `warn!`, etc.) are forwarded to OpenTelemetry through this integration.