Expose metric for log export failure #6709#6779
Expose metric for log export failure #6709#6779harshitrjpt wants to merge 2 commits intoopen-telemetry:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6779 +/- ##
============================================
+ Coverage 90.10% 90.11% +0.01%
- Complexity 6541 6542 +1
============================================
Files 728 728
Lines 19695 19703 +8
Branches 1935 1935
============================================
+ Hits 17746 17756 +10
+ Misses 1349 1347 -2
Partials 600 600 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| .build(); | ||
| logsExportFailureCounter = | ||
| meter | ||
| .counterBuilder("logsExportFailure") |
There was a problem hiding this comment.
While this seems like a small change, I'm reluctant to make it because there have been some attempts to standardize the SDKs' internal telemetry (e.g. OTEP#238).
The problem with continuing the pattern of these current metrics is that the structure doesn't conform to our semantic convention recommendations.
- The unit is wrong - should probably be
{export}instead of1 - The metric name doesn't include a namespace
- The attributes don't have a namespace
Extending the instrumentation extends bad patterns. Fixing the bad patterns exposes our users to breaking changes, only to have more later if / when semantic conventions emerge. So we appear to be stuck. I'll bring it up at next week's java SIG to see if can reach any conclusion.
There was a problem hiding this comment.
@jack-berg agree with you here that current pattern (existing as well as any proposed metric in future) doesn't conform to semantic recommendations like metric name having namespace, well defined units, etc.
So we are stuck between extending new instrumentations/ rectifying' the existing instrumentations with bad semantics AND the recommended ones. Do let us know how the discussions go with this. As this will be applicable in general, not just here.
| .build(); | ||
| logsExportFailureCounter = | ||
| meter | ||
| .counterBuilder("logsExportFailure") |
There was a problem hiding this comment.
The OTLP exporters already have dedicated metrics to track failures: https://github.com/open-telemetry/opentelemetry-java/blob/main/exporters/common/src/main/java/io/opentelemetry/exporter/internal/ExporterMetrics.java
Would these serve your needs?
There was a problem hiding this comment.
@jack-berg Thanks. I think this does address the requirement. I tried finding if something already exists for exporter in general, as this is a generic need for any kind of exporter not just BatchLogExporter.
I enabled 'OTEL_EXPORTER_METRICS_ENABLED' and got this output. Let me check with the original reporter of the issue.
ScopeMetrics #2
ScopeMetrics SchemaURL:
InstrumentationScope io.opentelemetry.exporters.otlp-grpc
Metric #0
Descriptor:
-> Name: otlp.exporter.exported
-> Description:
-> Unit:
-> DataType: Sum
-> IsMonotonic: true
-> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
-> success: Bool(false)
-> type: Str(log)
StartTimestamp: 2024-10-14 10:06:54.40763 +0000 UTC
Timestamp: 2024-10-14 10:10:54.418291 +0000 UTC
Value: 9
Metric #1
Descriptor:
-> Name: otlp.exporter.seen
-> Description:
-> Unit:
-> DataType: Sum
-> IsMonotonic: true
-> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
-> type: Str(log)
StartTimestamp: 2024-10-14 10:06:54.40763 +0000 UTC
Timestamp: 2024-10-14 10:10:54.418291 +0000 UTC
Value: 9
There was a problem hiding this comment.
These metrics should be enabled by default if using autoconfigure. Note if not using autoconfigure, you need to carefully order the initialization so that the configured meter provider can be passed to the OTLP exporters for spans and logs to collect internal telemetry.
I'm not sure what OTEL_EXPORTER_METRICS_ENABLED is a reference to. Its not a property that's used in this repository.
There was a problem hiding this comment.
My bad, I didn't backup the entire collector logs and misinterpreted that these metrics need to be enabled. These are present by default.
|
where does this PR stand? Do we need it if we have the generic metrics already? |
|
There are now standard semantic conventions for this: https://opentelemetry.io/docs/specs/semconv/otel/sdk-metrics/ #7895 is working on the implementation for traces, and there will be a followup for logs. |
Ran in local, otel collector log shows the metric as following: