Skip to content

Commit 15d42b2

Browse files
committed
fix PR commentts
1 parent ff9756a commit 15d42b2

File tree

1 file changed

+114
-98
lines changed

1 file changed

+114
-98
lines changed

docs/metrics.md

Lines changed: 114 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,24 @@
55

66
* [Best Practices](#best-practices)
77
* [Metrics API](#metrics-api)
8-
* [Meter](#meter)
9-
* [Instruments](#instruments)
8+
* [Meter](#meter)
9+
* [Instruments](#instruments)
10+
* [Reporting measurements - use array slices for
11+
attributes](#reporting-measurements---use-array-slices-for-attributes)
12+
* [Reporting measurements via synchronous
13+
instruments](#reporting-measurements-via-synchronous-instruments)
14+
* [Reporting measurements via asynchronous
15+
instruments](#reporting-measurements-via-asynchronous-instruments)
1016
* [MeterProvider Management](#meterprovider-management)
1117
* [Memory Management](#memory-management)
12-
* [Example](#example)
13-
* [Pre-Aggregation](#pre-aggregation)
14-
* [Cardinality Limits](#cardinality-limits)
15-
* [Memory Preallocation](#memory-preallocation)
18+
* [Example](#example)
19+
* [Pre-Aggregation](#pre-aggregation)
20+
* [Pre-Aggregation Benefits](#pre-aggregation-benefits)
21+
* [Cardinality Limits](#cardinality-limits)
22+
* [Cardinality Limit - Implications](#cardinality-limit---implications)
23+
* [Memory Preallocation](#memory-preallocation)
1624
* [Metrics Correlation](#metrics-correlation)
17-
* [Metrics Enrichment](#metrics-enrichment)
25+
* [Modelling Metric Attributes](#modelling-metric-attributes)
1826

1927
</details>
2028

@@ -27,11 +35,11 @@ practices.
2735

2836
### Meter
2937

30-
:stop_sign: You should avoid creating
38+
:stop_sign: You should avoid creating duplicate
3139
[`Meter`](https://docs.rs/opentelemetry/latest/opentelemetry/metrics/struct.Meter.html)
32-
too frequently. `Meter` is fairly expensive and meant to be reused throughout
33-
the application. For most applications, a Meter should be obtained from `global`
34-
and saved for re-use.
40+
instances with the same name. `Meter` is fairly expensive and meant to be reused
41+
throughout the application. For most applications, a `Meter` should be obtained
42+
from `global` and saved for re-use.
3543

3644
The fully qualified module name might be a good option for the Meter name.
3745
Optionally, one may create a meter with version, schema_url, and additional
@@ -45,11 +53,11 @@ use opentelemetry::KeyValue;
4553
let scope = InstrumentationScope::builder("my_company.my_product.my_library")
4654
.with_version("0.17")
4755
.with_schema_url("https://opentelemetry.io/schema/1.2.0")
48-
.with_attributes([(KeyValue::new("key", "value"))])
56+
.with_attributes([KeyValue::new("key", "value")])
4957
.build();
5058

5159
// creating Meter with InstrumentationScope, comprising of
52-
// name, schema and attributes.
60+
// name, version, schema and attributes.
5361
let meter = global::meter_with_scope(scope);
5462

5563
// creating Meter with just name
@@ -60,10 +68,10 @@ let meter = global::meter("my_company.my_product.my_library");
6068

6169
:heavy_check_mark: You should understand and pick the right instrument type.
6270

63-
> [!NOTE] Picking the right instrument type for your use case is crucial to
64-
> ensure the correct semantics and performance. Check the [Instrument
65-
Selection](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#instrument-selection)
66-
section from the supplementary guidelines for more information.
71+
> [!NOTE] Picking the right instrument type for your use case is crucial to
72+
> ensure the correct semantics and performance. Check the [Instrument
73+
Selection](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#instrument-selection)
74+
section from the supplementary guidelines for more information.
6775

6876
| OpenTelemetry Specification | OTel Rust Instrument Type |
6977
| --------------------------- | -------------------- |
@@ -75,14 +83,14 @@ let meter = global::meter("my_company.my_product.my_library");
7583
| [Histogram](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#histogram) | [`Histogram`](https://docs.rs/opentelemetry/latest/opentelemetry/metrics/struct.Histogram.html) |
7684
| [UpDownCounter](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#updowncounter) | [`UpDownCounter`](https://docs.rs/opentelemetry/latest/opentelemetry/metrics/struct.UpDownCounter.html) |
7785

78-
:stop_sign: You should avoid creating instruments (e.g. `Counter`) too
79-
frequently. Instruments are fairly expensive and meant to be reused. For most
80-
applications, instruments can be created once and re-used. Instruments can also
81-
be cloned to create multiple handles to the same instrument, but the cloning
82-
should not be on hot path, but instead the cloned instance should be stored and
83-
re-used.
86+
:stop_sign: You should avoid creating duplicate instruments (e.g., `Counter`)
87+
with the same name. Instruments are fairly expensive and meant to be reused
88+
throughout the application. For most applications, an instrument should be
89+
created once and saved for re-use. Instruments can also be cloned to create
90+
multiple handles to the same instrument, but cloning should not occur on the hot
91+
path. Instead, the cloned instance should be stored and reused.
8492

85-
:stop_sign: You should avoid invalid instrument names.
93+
:stop_sign: Do NOT use invalid instrument names.
8694

8795
> [!NOTE] OpenTelemetry will not collect metrics from instruments that are using
8896
> invalid names. Refer to the [OpenTelemetry
@@ -96,47 +104,71 @@ measurements.
96104
> not following the same order as before:
97105
98106
```rust
99-
let counter = meter.u64_counter("example_counter").build();
107+
let counter = meter.u64_counter("fruits_sold").build();
100108
counter.add(2, &[KeyValue::new("color", "red"), KeyValue::new("name", "apple")]);
101109
counter.add(3, &[KeyValue::new("color", "green"), KeyValue::new("name", "lime")]);
102110
counter.add(5, &[KeyValue::new("color", "yellow"), KeyValue::new("name", "lemon")]);
103-
counter.add(8, &[KeyValue::new("name", "lemon"), KeyValue::new("color", "yellow")]); // bad perf
111+
counter.add(8, &[KeyValue::new("name", "lemon"), KeyValue::new("color", "yellow")]); // bad performance
104112
```
105113

106-
:heavy_check_mark: You should use consistent ordering of attributes to achieve
107-
the best performance.
108-
109-
:heavy_check_mark: If feasible, provide the attributes in the sorted order of
110-
`Key`s, to minimize memory usage within the Metrics SDK.
114+
:heavy_check_mark: If feasible, provide the attributes sorted by `Key`s in
115+
ascending order to minimize memory usage within the Metrics SDK.
111116

112117
```rust
113-
let counter = meter.u64_counter("example_counter").build();
118+
let counter = meter.u64_counter("fruits_sold").build();
114119
counter.add(2, &[KeyValue::new("color", "red"), KeyValue::new("name", "apple")]);
115120
```
116121

117-
### Reporting measurements via synchronous instruments
122+
### Reporting measurements - use array slices for attributes
118123

119124
:heavy_check_mark: When reporting measurements, use array slices for attributes
120125
rather than creating vectors. Arrays are more efficient as they avoid
121-
unnecessary heap allocations on the measurement path:
126+
unnecessary heap allocations on the measurement path. This is true for both
127+
synchronous and observable instruments.
122128

123129
```rust
124130
// Good practice: Using an array slice directly
125131
counter.add(2, &[KeyValue::new("color", "red"), KeyValue::new("name", "apple")]);
126132

133+
// Usi
134+
let _observable_counter = meter
135+
.u64_observable_counter("request_count")
136+
.with_callback(|observer| {
137+
// Good practice: Using an array slice directly
138+
observer.observe(
139+
100,
140+
&[KeyValue::new("endpoint", "/api")]
141+
)
142+
})
143+
.build();
144+
127145
// Avoid this: Creating a Vec is unnecessary, and it allocates on the heap
128146
// counter.add(2, &vec![KeyValue::new("color", "red"), KeyValue::new("name", "apple")]);
129147
```
130148

131-
This approach minimizes memory allocations on metrics hot paths, improving
132-
overall performance when reporting metrics frequently.
149+
### Reporting measurements via synchronous instruments
150+
151+
:heavy_check_mark: Use synchronous Counter when you need to increment counts at
152+
specific points in your code:
153+
154+
```rust
155+
// Example: Using Counter when incrementing at specific code points
156+
use opentelemetry::KeyValue;
157+
158+
fn process_item(counter: &opentelemetry::metrics::Counter<u64>, item_type: &str) {
159+
// Process item...
160+
161+
// Increment the counter with the item type as an attribute
162+
counter.add(1, &[KeyValue::new("type", item_type)]);
163+
}
164+
```
133165

134166
### Reporting measurements via asynchronous instruments
135167

136168
Asynchronous instruments like `ObservableCounter` are ideal for reporting
137169
metrics that are already being tracked or stored elsewhere in your application.
138-
These instruments allow you to observe and report the current state of existing
139-
values without manually managing the aggregation yourself.
170+
These instruments allow you to observe and report the current state of such
171+
metric without manually managing the aggregation yourself.
140172

141173
:heavy_check_mark: Use `ObservableCounter` when you already have a variable
142174
tracking a count:
@@ -172,45 +204,24 @@ fn setup_metrics(meter: &opentelemetry::metrics::Meter) {
172204
}
173205
```
174206

175-
:heavy_check_mark: Use regular Counter when you need to increment counts at
176-
specific points in your code:
177-
178-
```rust
179-
// Example: Using Counter when incrementing at specific code points
180-
use opentelemetry::KeyValue;
181-
182-
fn setup_metrics(meter: &opentelemetry::metrics::Meter) -> opentelemetry::metrics::Counter<u64> {
183-
meter
184-
.u64_counter("processed_items")
185-
.with_description("Number of items processed")
186-
.build()
187-
}
188-
189-
// Then in your processing code:
190-
fn process_item(counter: &opentelemetry::metrics::Counter<u64>, item_type: &str) {
191-
// Process item...
192-
193-
// Increment the counter with the item type as an attribute
194-
counter.add(1, &[KeyValue::new("type", item_type)]);
195-
}
196-
```
207+
> [!NOTE] The callbacks in the Observable instruments are invoked by the SDK
208+
during each export cycle.
197209

198210
## MeterProvider Management
199211

200-
:stop_sign: Avoid creating `MeterProvider` instances too frequently. A
201-
`MeterProvider` is resource-intensive and designed to be reused throughout the
202-
application. Typically, one `MeterProvider` instance per process is sufficient.
212+
Most use-cases require you to create ONLY one instance of MeterProvider. You
213+
should NOT create multiple instances of MeterProvider unless you have some
214+
unusual requirement of having different export strategies within the same
215+
application. Using multiple instances of MeterProvider requires users to
216+
exercise caution..
203217

204218
:heavy_check_mark: Properly manage the lifecycle of `MeterProvider` instances if
205219
you create them. Follow these guidelines:
206220

207-
* **Create Once, Reuse Always**: Create a single `MeterProvider` instance and
208-
reuse it throughout the application. Avoid creating multiple instances to
209-
conserve resources.
210-
211221
* **Cloning**: A `MeterProvider` is a handle to an underlying provider. Cloning
212222
it creates a new handle pointing to the same provider. Clone the
213-
`MeterProvider` when necessary.
223+
`MeterProvider` when necessary, but re-use the cloned instead of repeatedly
224+
cloning.
214225

215226
* **Set as Global Provider**: Use `opentelemetry::global::set_meter_provider` to
216227
set a clone of the `MeterProvider` as the global provider. This ensures
@@ -220,11 +231,10 @@ you create them. Follow these guidelines:
220231
* **Shutdown**: Explicitly call `shutdown` on the `MeterProvider` at the end of
221232
your application to ensure all metrics are properly flushed and exported.
222233

223-
> [!NOTE] Dropping the last reference to a `MeterProvider` instance implicitly
224-
> calls `shutdown`. However, this does not apply if
225-
> `opentelemetry::global::set_meter_provider` is used to set it as the global
226-
> provider. When set globally, the `MeterProvider` is backed by a static
227-
> reference, preventing automatic drop and shutdown.
234+
> [!NOTE] If you did not use opentelemetry::global::set_meter_provider to set a
235+
> clone of the MeterProvider as the global provider, then you should be aware
236+
> that dropping the last instance of MeterProvider implicitly call shutdown on
237+
> the provider.
228238
229239
:heavy_check_mark: Always call `shutdown` on the `MeterProvider` at the end of
230240
your application to ensure proper cleanup.
@@ -335,23 +345,26 @@ Pre-aggregation offers several advantages:
335345
1. **Reduced Data Volume**: Summarizes measurements before export, minimizing
336346
network overhead and improving efficiency.
337347
2. **Predictable Resource Usage**: Ensures consistent resource consumption by
338-
applying [cardinality limits](#cardinality-limits) and [memory
339-
preallocation](#memory-preallocation) during SDK initialization. In other words,
340-
metrics storage/network needs remains fixed, irrespective of growing volume or
341-
changing traffic patterns.
342-
3. **Improved Performance**: Reduces computational load on downstream systems,
343-
enabling them to focus on analysis and storage.
344-
345-
> [!NOTE] There is no ability to export raw measurement events instead of using
346-
pre-aggregation.
348+
applying [cardinality limits](#cardinality-limits) and [memory
349+
preallocation](#memory-preallocation) during SDK initialization. In other
350+
words, metrics memory/network usage remains capped, regardless of the volume
351+
of measurements being made.This ensures that resource utilization remains
352+
stable despite fluctuations in traffic volume.
353+
3. **Improved Performance**: Reduces serialization costs as we work with
354+
aggregated data and not the numerous individual measurements. It also reduces
355+
computational load on downstream systems, enabling them to focus on analysis
356+
and storage.
357+
358+
> [!NOTE] There is no ability to opt out of pre-aggregation.
347359
348360
### Cardinality Limits
349361

350-
The number of unique combinations of attributes is called cardinality. Taking
351-
the [fruit example](#example), if we know that we can only have apple/lemon as
352-
the name, red/yellow/green as the color, then we can say the cardinality is 6.
353-
No matter how many fruits we sell, we can always use the following table to
354-
summarize the total number of fruits based on the name and color.
362+
The number of unique combinations of attributes for a given metric is referred
363+
to as the cardinality of that metric. Taking the [fruit example](#example), if
364+
we know that we can only have apple/lemon as the name, red/yellow/green as the
365+
color, then we can say the cardinality is 6 (i.e 2 * 3). No matter how many
366+
fruits we sell, we can always use the following table to summarize the total
367+
number of fruits based on the name and color.
355368

356369
| Name | Color | Count |
357370
| ----- | ------ | ----- |
@@ -362,22 +375,22 @@ summarize the total number of fruits based on the name and color.
362375
| lemon | yellow | 12 |
363376
| lemon | green | 0 |
364377

365-
In other words, we know how much storage and network are needed to collect and
378+
In other words, we know how much memory and network are needed to collect and
366379
transmit these metrics, regardless of the traffic pattern or volume.
367380

368381
In real world applications, the cardinality can be extremely high. Imagine if we
369382
have a long running service and we collect metrics with 7 attributes and each
370383
attribute can have 30 different values. We might eventually end up having to
371-
remember the complete set of all 21,870,000,000 combinations! This cardinality
372-
explosion is a well-known challenge in the metrics space. For example, it can
373-
cause surprisingly high costs in the observability system, or even be leveraged
374-
by hackers to launch a denial-of-service attack.
384+
remember the complete set of 30⁷ - or 21.87 billion combinations! This
385+
cardinality explosion is a well-known challenge in the metrics space. For
386+
example, it can cause surprisingly high costs in the observability system, or
387+
even be leveraged by bad actors to launch a denial-of-service attack.
375388

376389
[Cardinality
377390
limit](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#cardinality-limits)
378391
is a throttling mechanism which allows the metrics collection system to have a
379-
predictable and reliable behavior when excessive cardinality happens, whether it
380-
was due to a malicious attack or developer making mistakes while writing code.
392+
predictable and reliable behavior when there is a cardinality explosion, be it
393+
due to a malicious attack or developer making mistakes while writing code.
381394

382395
OpenTelemetry has a default cardinality limit of `2000` per metric. This limit
383396
can be configured at the individual metric level using the [View
@@ -415,7 +428,10 @@ important:
415428
* **Cardinality Capping**: When the limit is reached within an export interval,
416429
any new attribute combination is not individually tracked but instead folded
417430
into a single aggregation with the attribute `{"otel.metric.overflow": true}`.
418-
This ensures that measurements are never lost, even when cardinality explodes.
431+
This preserves the overall accuracy of aggregates (such as Sum, Count, etc.)
432+
even though information about specific attribute combinations is lost. Every
433+
measurement is accounted for - either with its original attributes or within
434+
the overflow bucket.
419435

420436
* **Temporality Effects**: The impact of cardinality limits differs based on the
421437
temporality mode:
@@ -459,7 +475,7 @@ important:
459475
cardinality limit of 3 and we're tracking sales with attributes for `name`,
460476
`color`, and `store_location`:
461477

462-
During a busy sales period at time T1-T4, we record:
478+
During a busy sales period at time (T3, T4], we record:
463479

464480
1. 10 red apples sold at Downtown store
465481
2. 5 yellow lemons sold at Uptown store
@@ -480,8 +496,8 @@ important:
480496
If we later query "How many red apples were sold?" the answer would be 10, not
481497
13, because the Midtown sales were folded into the overflow bucket. Similarly,
482498
queries about "How many items were sold in Midtown?" would return 0, not 3.
483-
However, the total count across all dimensions (i.e How many fruits were sold
484-
in T1-T4 would correctly give 26) would be accurate.
499+
However, the total count across all dimensions (i.e How many total fruits were
500+
sold in (T3, T4] would correctly give 26) would be accurate.
485501

486502
This limitation applies regardless of whether the attribute in question is
487503
naturally high-cardinality. Even low-cardinality attributes like "color"

0 commit comments

Comments
 (0)