Skip to content

Commit c0bd086

Browse files
Merge branch 'main' into add-timeout-duration-to-shutdown
2 parents 41b620d + 52cd0e9 commit c0bd086

File tree

70 files changed

+1939
-456
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+1939
-456
lines changed

.github/workflows/integration_tests.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,7 @@ jobs:
2323
with:
2424
components: rustfmt
2525
- uses: arduino/setup-protoc@v3
26+
with:
27+
repo-token: ${{ secrets.GITHUB_TOKEN }}
2628
- name: Run integration tests
2729
run: ./scripts/integration_tests.sh

.github/workflows/pr_criterion.yaml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,12 @@ jobs:
66
permissions:
77
pull-requests: write
88
runs-on: ubuntu-latest
9+
if: ${{ contains(github.event.pull_request.labels.*.name, 'performance') }}
910
steps:
10-
- uses: actions/checkout@v3
11+
- uses: actions/checkout@v4
1112
- uses: arduino/setup-protoc@v3
13+
with:
14+
repo-token: ${{ secrets.GITHUB_TOKEN }}
1215
- uses: dtolnay/rust-toolchain@master
1316
with:
1417
toolchain: stable

.github/workflows/pr_naming.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ jobs:
99
runs-on: ubuntu-latest
1010
steps:
1111
- name: PR Conventional Commit Validation
12-
uses: ytanikin/[email protected].0
12+
uses: ytanikin/[email protected].1
1313
with:
1414
task_types: '["build","chore","ci","docs","feat","fix","perf","refactor","revert","test"]'
1515
add_label: 'false'

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ for specific dates and for Zoom meeting links. "OTel Rust SIG" is the name of
88
meeting for this group.
99

1010
Meeting notes are available as a public [Google
11-
doc](https://docs.google.com/document/d/1tGKuCsSnyT2McDncVJrMgg74_z8V06riWZa0Sr79I_4/edit).
11+
doc](https://docs.google.com/document/d/12upOzNk8c3SFTjsL6IRohCWMgzLKoknSCOOdMakbWo4/edit).
1212
If you have trouble accessing the doc, please get in touch on
1313
[Slack](https://cloud-native.slack.com/archives/C03GDP0H023).
1414

Cargo.toml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,5 +85,12 @@ opentelemetry = { path = "opentelemetry" }
8585
opentelemetry_sdk = { path = "opentelemetry-sdk" }
8686
opentelemetry-stdout = { path = "opentelemetry-stdout" }
8787

88+
[workspace.lints.rust]
89+
rust_2024_compatibility = { level = "warn", priority = -1 }
90+
# No need to enable those, because it either not needed or results in ugly syntax
91+
edition_2024_expr_fragment_specifier = "allow"
92+
if_let_rescope = "allow"
93+
tail_expr_drop_order = "allow"
94+
8895
[workspace.lints.clippy]
8996
all = { level = "warn", priority = 1 }

docs/adr/001_error_handling.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
# Error handling patterns in public API interfaces
2+
## Date
3+
27 Feb 2025
4+
5+
## Summary
6+
7+
This ADR describes the general pattern we will follow when modelling errors in public API interfaces - that is, APIs that are exposed to users of the project's published crates. It summarises the discussion and final option from [#2571](https://github.com/open-telemetry/opentelemetry-rust/issues/2571); for more context check out that issue.
8+
9+
We will focus on the exporter traits in this example, but the outcome should be applied to _all_ public traits and their fallible operations.
10+
11+
These include [SpanExporter](https://github.com/open-telemetry/opentelemetry-rust/blob/eca1ce87084c39667061281e662d5edb9a002882/opentelemetry-sdk/src/trace/export.rs#L18), [LogExporter](https://github.com/open-telemetry/opentelemetry-rust/blob/eca1ce87084c39667061281e662d5edb9a002882/opentelemetry-sdk/src/logs/export.rs#L115), and [PushMetricExporter](https://github.com/open-telemetry/opentelemetry-rust/blob/eca1ce87084c39667061281e662d5edb9a002882/opentelemetry-sdk/src/metrics/exporter.rs#L11) which form part of the API surface of `opentelemetry-sdk`.
12+
13+
There are various ways to handle errors on trait methods, including swallowing them and logging, panicing, returning a shared global error, or returning a method-specific error. We strive for consistency, and we want to be sure that we've put enough thought into what this looks like that we don't have to make breaking interface changes unecessarily in the future.
14+
15+
## Design Guidance
16+
17+
### 1. No panics from SDK APIs
18+
Failures during regular operation should not panic, instead returning errors to the caller where appropriate, _or_ logging an error if not appropriate.
19+
Some of the opentelemetry SDK interfaces are dictated by the specification in way such that they may not return errors.
20+
21+
### 2. Consolidate error types within a trait where we can, let them diverge when we can't**
22+
23+
We aim to consolidate error types where possible _without indicating a function may return more errors than it can actually return_.
24+
25+
**Don't do this** - each function's signature indicates that it returns errors it will _never_ return, forcing the caller to write handlers for dead paths:
26+
```rust
27+
enum MegaError {
28+
TooBig,
29+
TooSmall,
30+
TooLong,
31+
TooShort
32+
}
33+
34+
trait MyTrait {
35+
36+
// Will only ever return TooBig,TooSmall errors
37+
fn action_one() -> Result<(), MegaError>;
38+
39+
// These will only ever return TooLong,TooShort errors
40+
fn action_two() -> Result<(), MegaError>;
41+
fn action_three() -> Result<(), MegaError>;
42+
}
43+
```
44+
45+
**Instead, do this** - each function's signature indicates only the errors it can return, providing an accurate contract to the caller:
46+
47+
```rust
48+
enum ErrorOne {
49+
TooBig,
50+
TooSmall,
51+
}
52+
53+
enum ErrorTwo {
54+
TooLong,
55+
TooShort
56+
}
57+
58+
trait MyTrait {
59+
fn action_one() -> Result<(), ErrorOne>;
60+
61+
// Action two and three share the same error type.
62+
// We do not introduce a common error MyTraitError for all operations, as this would
63+
// force all methods on the trait to indicate they return errors they do not return,
64+
// complicating things for the caller.
65+
fn action_two() -> Result<(), ErrorTwo>;
66+
fn action_three() -> Result<(), ErrorTwo>;
67+
}
68+
```
69+
70+
## 3. Consolidate error types between signals where we can, let them diverge where we can't
71+
72+
Consider the `Exporter`s mentioned earlier. Each of them has the same failure indicators - as dicated by the OpenTelemetry spec - and we will
73+
share the error types accordingly:
74+
75+
**Don't do this** - each signal has its own error type, despite having exactly the same failure cases:
76+
77+
```rust
78+
#[derive(Error, Debug)]
79+
pub enum OtelTraceError {
80+
#[error("Shutdown already invoked")]
81+
AlreadyShutdown,
82+
83+
#[error("Operation failed: {0}")]
84+
InternalFailure(String),
85+
86+
/** ... additional errors ... **/
87+
}
88+
89+
#[derive(Error, Debug)]
90+
pub enum OtelLogError {
91+
#[error("Shutdown already invoked")]
92+
AlreadyShutdown,
93+
94+
#[error("Operation failed: {0}")]
95+
InternalFailure(String),
96+
97+
/** ... additional errors ... **/
98+
}
99+
```
100+
101+
**Instead, do this** - error types are consolidated between signals where this can be done appropriately:
102+
103+
```rust
104+
105+
/// opentelemetry-sdk::error
106+
107+
#[derive(Error, Debug)]
108+
pub enum OTelSdkError {
109+
#[error("Shutdown already invoked")]
110+
AlreadyShutdown,
111+
112+
#[error("Operation failed: {0}")]
113+
InternalFailure(String),
114+
115+
/** ... additional errors ... **/
116+
}
117+
118+
pub type OTelSdkResult = Result<(), OTelSdkError>;
119+
120+
/// signal-specific exporter traits all share the same
121+
/// result types for the exporter operations.
122+
123+
// pub trait LogExporter {
124+
// pub trait SpanExporter {
125+
pub trait PushMetricExporter {
126+
fn export(&self, /* ... */) -> OtelSdkResult;
127+
fn force_flush(&self, /* ... */ ) -> OTelSdkResult;
128+
fn shutdown(&self, /* ... */ ) -> OTelSdkResult;
129+
```
130+
131+
If this were _not_ the case - if we needed to mark an extra error for instance for `LogExporter` that the caller could reasonably handle -
132+
we would let that error traits diverge at that point.
133+
134+
### 4. Box custom errors where a savvy caller may be able to handle them, stringify them if not
135+
136+
Note above that we do not box any `Error` into `InternalFailure`. Our rule here is that if the caller cannot reasonably be expected to handle a particular error variant, we will use a simplified interface that returns only a descriptive string. In the concrete example we are using with the exporters, we have a [strong signal in the opentelemetry-specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/sdk.md#export) that indicates that the error types _are not actionable_ by the caller.
137+
138+
If the caller may potentially recover from an error, we will follow the generally-accepted best practice (e.g., see [canonical's guide](https://canonical.github.io/rust-best-practices/error-and-panic-discipline.html) and instead preserve the nested error:
139+
140+
**Don't do this if the OtherError is potentially recoverable by a savvy caller**:
141+
```rust
142+
143+
#[derive(Debug, Error)]
144+
pub enum MyError {
145+
#[error("Error one occurred")]
146+
ErrorOne,
147+
148+
#[error("Operation failed: {0}")]
149+
OtherError(String),
150+
```
151+
152+
**Instead, do this**, allowing the caller to match on the nested error:
153+
154+
```rust
155+
#[derive(Debug, Error)]
156+
pub enum MyError {
157+
#[error("Error one occurred")]
158+
ErrorOne,
159+
160+
#[error("Operation failed: {source}")]
161+
OtherError {
162+
#[from]
163+
source: Box<dyn Error + Send + Sync>,
164+
},
165+
}
166+
```
167+
168+
Note that at the time of writing, there is no instance we have identified within the project that has required this.
169+
170+
### 5. Use thiserror by default
171+
We will use [thiserror](https://docs.rs/thiserror/latest/thiserror/) by default to implement Rust's [error trait](https://doc.rust-lang.org/core/error/trait.Error.html).
172+
This keeps our code clean, and as it does not appear in our interface, we can choose to replace any particular usage with a hand-rolled implementation should we need to.
173+

docs/adr/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Architectural Decision Records
2+
3+
This directory contains architectural decision records made for the opentelemetry-rust project. These allow us to consolidate discussion, options, and outcomes, around key architectural decisions.
4+
5+
* [001 - Error Handling](001_error_handling.md)

docs/design/logs.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,6 +296,19 @@ convey that decision back to logger, allowing appender to avoid even the cost of
296296
creating a `LogRecord` in the first place if there is no listener. This check is
297297
done for each log emission, and can react dynamically to changes in interest, by
298298
enabling/disabling ETW/user-event listener.
299+
5. `tracing` has a notion of "target", which is expected to be mapped to OTel's
300+
concept of Instrumentation Scope for Logs, when `OpenTelemetry-Tracing-Appender`
301+
bridges `tracing` to OpenTelemetry. Since scopes are tied to Loggers, a naive
302+
approach would require creating a separate logger for each unique target. This
303+
would necessitate an RWLock-protected HashMap lookup, introducing contention and
304+
reducing throughput. To avoid this, `OpenTelemetry-Tracing-Appender` instead
305+
stores the target directly in the LogRecord as a top-level field, ensuring fast
306+
access in the hot path. Components processing the LogRecord can retrieve the
307+
target via LogRecord.target(), treating it as the scope. The OTLP Exporter
308+
already handles this automatically, so end-users will see “target” reflected in
309+
the Instrumentation Scope. An alternative design would be to use thread-local
310+
HashMaps - but it can cause increased memory usage, as there can be 100s of
311+
unique targets. (because `tracing` defaults to using module path as target).
299312

300313
### Perf test - benchmarks
301314

examples/tracing-grpc/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ opentelemetry_sdk = { path = "../../opentelemetry-sdk", features = ["rt-tokio"]
1919
opentelemetry-stdout = { workspace = true, features = ["trace"] }
2020
prost = { workspace = true }
2121
tokio = { workspace = true, features = ["full"] }
22-
tonic = { workspace = true, features = ["server"] }
22+
tonic = { workspace = true, features = ["server", "codegen", "channel", "prost"] }
2323

2424
[build-dependencies]
2525
tonic-build = { workspace = true }

examples/tracing-grpc/src/client.rs

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ fn init_tracer() -> sdktrace::SdkTracerProvider {
1313
global::set_text_map_propagator(TraceContextPropagator::new());
1414
// Install stdout exporter pipeline to be able to retrieve the collected spans.
1515
let provider = sdktrace::SdkTracerProvider::builder()
16-
.with_batch_exporter(SpanExporter::default())
16+
.with_simple_exporter(SpanExporter::default())
1717
.build();
1818

1919
global::set_tracer_provider(provider.clone());
@@ -43,10 +43,14 @@ async fn greet() -> Result<(), Box<dyn std::error::Error + Send + Sync + 'static
4343
let span = tracer
4444
.span_builder("Greeter/client")
4545
.with_kind(SpanKind::Client)
46-
.with_attributes([KeyValue::new("component", "grpc")])
46+
.with_attributes([
47+
KeyValue::new("rpc.system", "grpc"),
48+
KeyValue::new("server.port", 50052),
49+
KeyValue::new("rpc.method", "say_hello"),
50+
])
4751
.start(&tracer);
4852
let cx = Context::current_with_span(span);
49-
let mut client = GreeterClient::connect("http://[::1]:50051").await?;
53+
let mut client = GreeterClient::connect("http://[::1]:50052").await?;
5054

5155
let mut request = tonic::Request::new(HelloRequest {
5256
name: "Tonic".into(),
@@ -58,16 +62,23 @@ async fn greet() -> Result<(), Box<dyn std::error::Error + Send + Sync + 'static
5862

5963
let response = client.say_hello(request).await;
6064

65+
let span = cx.span();
6166
let status = match response {
62-
Ok(_res) => "OK".to_string(),
67+
Ok(_res) => {
68+
span.set_attribute(KeyValue::new("response", "OK"));
69+
"OK".to_string()
70+
}
6371
Err(status) => {
6472
// Access the status code
6573
let status_code = status.code();
74+
span.set_attribute(KeyValue::new(
75+
"response_code_desc",
76+
status_code.description(),
77+
));
6678
status_code.to_string()
6779
}
6880
};
69-
cx.span()
70-
.add_event("Got response!", vec![KeyValue::new("status", status)]);
81+
span.add_event("Got response!", vec![KeyValue::new("status", status)]);
7182

7283
Ok(())
7384
}

0 commit comments

Comments
 (0)