You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/service-bus-messaging/service-bus-end-to-end-tracing.md
+21-22Lines changed: 21 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,26 +1,26 @@
1
1
---
2
-
title: Azure Service Bus end-to-end tracing and diagnostics | Microsoft Docs
2
+
title: End-to-end tracing and diagnostics
3
3
description: Overview of Service Bus client diagnostics and end-to-end tracing (client through all the services that are involved in processing.)
4
-
ms.topic: article
5
-
ms.date: 12/21/2022
4
+
ms.topic: concept-article
5
+
ms.date: 12/19/2024
6
6
ms.devlang: csharp
7
-
ms.custom: devx-track-csharp, devx-track-dotnet
7
+
ms.custom: devx-track-csharp, devx-track-dotnet"
8
+
# Customer intent: I want o learn how to trace operations from a client through all the services that are involving in processing.
8
9
---
9
10
10
11
# Distributed tracing and correlation through Service Bus messaging
11
12
12
-
One of the common problems in micro services development is the ability to trace operation from a client through all the services that are involved in processing. It's useful for debugging, performance analysis, A/B testing, and other typical diagnostics scenarios.
13
-
One part of this problem is tracking logical pieces of work. It includes message processing result and latency and external dependency calls. Another part is correlation of these diagnostics events beyond process boundaries.
13
+
One of the common problems in micro services development is the ability to trace operation from a client through all the services that are involved in processing. It's useful for debugging, performance analysis, A/B testing, and other typical diagnostics scenarios. One part of this problem is tracking logical pieces of work, which includes message processing result and latency and external dependency calls. Another part is correlation of these diagnostics events beyond process boundaries.
14
14
15
-
When a producer sends a message through a queue, it typically happens in the scope of some other logical operation, initiated by some other client or service. The same operation is continued by consumer once it receives a message. Both producer and consumer (and other services that process the operation), presumably emit telemetry events to trace the operation flow and result. In order to correlate such events and trace operation end-to-end, each service that reports telemetry has to stamp every event with a trace context. One library that can help developers have all of this telemetry emitted by default is [NServiceBus](https://docs.particular.net/nservicebus/operations/opentelemetry).
15
+
When a producer sends a message through a queue, it typically happens in the scope of some other logical operation, initiated by some other client or service. The same operation is continued by consumer once it receives a message. Both producer and consumer (and other services that process the operation), presumably emit telemetry events to trace the operation flow and result. In order to correlate such events and trace operation end-to-end, each service that reports telemetry has to stamp every event with a trace context.
16
16
17
17
Microsoft Azure Service Bus messaging has defined payload properties that producers and consumers should use to pass such trace context.
18
18
The protocol is based on the [W3C Trace-Context](https://www.w3.org/TR/trace-context/).
| Diagnostic-Id | Unique identifier of an external call from producer to the queue. Refer to [W3C Trace-Context traceparent header](https://www.w3.org/TR/trace-context/#traceparent-header) for the format |
23
+
|`Diagnostic-Id`| Unique identifier of an external call from producer to the queue. Refer to [W3C Trace-Context trace parent header](https://www.w3.org/TR/trace-context/#traceparent-header) for the format |
24
24
25
25
## Service Bus .NET Client autotracing
26
26
The `ServiceBusProcessor` class of [Azure Messaging Service Bus client for .NET](/dotnet/api/azure.messaging.servicebus.servicebusprocessor) provides tracing instrumentation points that can be hooked by tracing systems, or piece of client code. The instrumentation allows tracking all calls to the Service Bus messaging service from client side. If message processing is done by using [`ProcessMessageAsync` of `ServiceBusProcessor`](/dotnet/api/azure.messaging.servicebus.servicebusprocessor.processmessageasync) (message handler pattern), the message processing is also instrumented.
@@ -81,7 +81,7 @@ It doesn't mean that there was a delay in receiving the message. In this scenari
81
81
Service Bus .NET Client library version 7.5.0 and later supports OpenTelemetry in experimental mode. For more information, see [Distributed tracing in .NET SDK](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/core/Azure.Core/samples/Diagnostics.md#opentelemetry-with-azure-monitor-zipkin-and-others).
82
82
83
83
### Tracking without tracing system
84
-
In case your tracing system doesn't support automatic Service Bus calls tracking you may be looking into adding such support into a tracing system or into your application. This section describes diagnostics events sent by Service Bus .NET client.
84
+
In case your tracing system doesn't support automatic Service Bus calls tracking you might be looking into adding such support into a tracing system or into your application. This section describes diagnostics events sent by Service Bus .NET client.
85
85
86
86
Service Bus .NET Client is instrumented using .NET tracing primitives [System.Diagnostics.Activity](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/ActivityUserGuide.md) and [System.Diagnostics.DiagnosticSource](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/DiagnosticSourceUsersGuide.md).
87
87
@@ -172,7 +172,7 @@ Here's the full list of instrumented operations:
172
172
In some cases, it's desirable to log only part of the events to reduce performance overhead or storage consumption. You could log 'Stop' events only (as in preceding example) or sample percentage of the events.
173
173
`DiagnosticSource` provide way to achieve it with `IsEnabled` predicate. For more information, see [Context-Based Filtering in DiagnosticSource](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/DiagnosticSourceUsersGuide.md#context-based-filtering).
174
174
175
-
`IsEnabled`may be called multiple times for a single operation to minimize performance impact.
175
+
`IsEnabled`might be called multiple times for a single operation to minimize performance impact.
176
176
177
177
`IsEnabled` is called in following sequence:
178
178
@@ -198,8 +198,8 @@ In presence of multiple `DiagnosticSource` listeners for the same source, it's e
| Diagnostic-Id | Unique identifier of an external call from producer to the queue. Refer to [Request-Id in HTTP protocol](https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/HttpCorrelationProtocol.md#request-id) for the rationale, considerations, and format |
202
-
| Correlation-Context | Operation context, which is propagated across all services involved in operation processing. For more information, see [Correlation-Context in HTTP protocol](https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/HttpCorrelationProtocol.md#correlation-context)|
201
+
|`Diagnostic-Id`| Unique identifier of an external call from producer to the queue. Refer to [Request-Id in HTTP protocol](https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/HttpCorrelationProtocol.md#request-id) for the rationale, considerations, and format |
202
+
|`Correlation-Context`| Operation context, which is propagated across all services involved in operation processing. For more information, see [Correlation-Context in HTTP protocol](https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/HttpCorrelationProtocol.md#correlation-context)|
203
203
204
204
## Service Bus .NET Client autotracing
205
205
@@ -258,7 +258,7 @@ If you're running any external code in addition to the Application Insights SDK,
258
258
It doesn't mean that there was a delay in receiving the message. In this scenario, the message has already been received since the message is passed in as a parameter to the SDK code. And, the **name** tag in the App Insights logs (**Process**) indicates that the message is now being processed by your external event processing code. This issue isn't Azure-related. Instead, these metrics refer to the efficiency of your external code given that the message has already been received from Service Bus. See [this file on GitHub](https://github.com/Azure/azure-sdk-for-net/blob/4bab05144ce647cc9e704d46d3763de5f9681ee0/sdk/servicebus/Microsoft.Azure.ServiceBus/src/ServiceBusDiagnosticsSource.cs) to see where the **Process** tag is generated and assigned once the message has been received from Service Bus.
259
259
260
260
### Tracking without tracing system
261
-
In case your tracing system doesn't support automatic Service Bus calls tracking you may be looking into adding such support into a tracing system or into your application. This section describes diagnostics events sent by Service Bus .NET client.
261
+
In case your tracing system doesn't support automatic Service Bus calls tracking you might be looking into adding such support into a tracing system or into your application. This section describes diagnostics events sent by Service Bus .NET client.
262
262
263
263
Service Bus .NET Client is instrumented using .NET tracing primitives [System.Diagnostics.Activity](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/ActivityUserGuide.md) and [System.Diagnostics.DiagnosticSource](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/DiagnosticSourceUsersGuide.md).
264
264
@@ -313,10 +313,10 @@ In this example, listener logs duration, result, unique identifier, and start ti
313
313
314
314
#### Events
315
315
316
-
For every operation, two events are sent: 'Start' and 'Stop'.
317
-
Most probably, you're only interested in 'Stop' events. They provide the result of operation, and start time and duration as Activity properties.
316
+
For every operation, two events are sent: Start and Stop.
317
+
Most probably, you're only interested in Stop events. They provide the result of operation, and start time and duration as Activity properties.
318
318
319
-
Event payload provides a listener with the context of the operation, it replicates API incoming parameters and return value. 'Stop' event payload has all the properties of 'Start' event payload, so you can ignore 'Start' event completely.
319
+
Event payload provides a listener with the context of the operation. It replicates API incoming parameters and return value. 'Stop' event payload has all the properties of 'Start' event payload, so you can ignore 'Start' event completely.
320
320
321
321
All events also have 'Entity' and 'Endpoint' properties.
322
322
*`string Entity` - - Name of the entity (queue, topic, etc.)
@@ -357,8 +357,8 @@ In every event, you can access `Activity.Current` that holds current operation c
357
357
`Activity.Current` provides detailed context of current operation and its parents. For more information, see [Activity documentation](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/ActivityUserGuide.md).
358
358
Service Bus instrumentation provides more information in the `Activity.Current.Tags` - they hold `MessageId` and `SessionId` whenever they're available.
359
359
360
-
Activities that track 'Receive', 'Peek' and 'ReceiveDeferred' event also may have `RelatedTo` tag. It holds distinct list of `Diagnostic-Id`(s) of messages that were received as a result.
361
-
Such operation may result in several unrelated messages to be received. Also, the `Diagnostic-Id` isn't known when operation starts, so 'Receive' operations could be correlated to 'Process' operations using this Tag only. It's useful when analyzing performance issues to check how long it took to receive the message.
360
+
Activities that track Receive, Peek, and ReceiveDeferred event also might have `RelatedTo` tag. It holds distinct list of `Diagnostic-Id`(s) of messages that were received as a result.
361
+
Such operation might result in several unrelated messages to be received. Also, the `Diagnostic-Id` isn't known when operation starts, so 'Receive' operations could be correlated to 'Process' operations using this Tag only. It's useful when analyzing performance issues to check how long it took to receive the message.
362
362
363
363
Efficient way to log Tags is to iterate over them, so adding Tags to the preceding example looks like
364
364
@@ -380,7 +380,7 @@ serviceBusLogger.LogInformation($"{currentActivity.OperationName} is finished, D
380
380
In some cases, it's desirable to log only part of the events to reduce performance overhead or storage consumption. You could log 'Stop' events only (as in preceding example) or sample percentage of the events.
381
381
`DiagnosticSource` provide way to achieve it with `IsEnabled` predicate. For more information, see [Context-Based Filtering in DiagnosticSource](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/DiagnosticSourceUsersGuide.md#context-based-filtering).
382
382
383
-
`IsEnabled`may be called multiple times for a single operation to minimize performance impact.
383
+
`IsEnabled`might be called multiple times for a single operation to minimize performance impact.
384
384
385
385
`IsEnabled` is called in following sequence:
386
386
@@ -400,8 +400,7 @@ In presence of multiple `DiagnosticSource` listeners for the same source, it's e
400
400
401
401
---
402
402
403
-
## Next steps
404
-
403
+
## Related content
404
+
* One library that can help developers have the telemetry emitted by default is [NServiceBus](https://docs.particular.net/nservicebus/operations/opentelemetry).
*[Application Insights Monitor Dependencies](/azure/azure-monitor/app/asp-net-dependencies) to see if REST, SQL, or other external resources are slowing you down.
407
406
*[Track custom operations with Application Insights .NET SDK](/azure/azure-monitor/app/custom-operations-tracking)
Copy file name to clipboardExpand all lines: articles/service-bus-messaging/service-bus-messaging-exceptions-latest.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ title: Azure Service Bus - messaging exceptions | Microsoft Docs
3
3
description: This article provides a list of Azure Service Bus messaging exceptions and suggested actions to taken when the exception occurs.
4
4
ms.topic: article
5
5
ms.custom: devx-track-dotnet
6
-
ms.date: 02/17/2023
6
+
ms.date: 12/19/2024
7
7
---
8
8
9
9
# Service Bus messaging exceptions (.NET)
@@ -206,12 +206,12 @@ We recommend that you follow these verification steps, depending on the type of
206
206
207
207
#### Causes
208
208
209
-
- During asynchronous replication (replication lag greater than zero), the client tries to perform an operation on a service bus entity (queue, topic) or performs a management operation, but the operation cannot be completed because the replication lag between the primary and the secondary regions has exceeded the maximum allowed replication lag in seconds.
210
-
-**Example**: The operation is being throttled because with it the new replication lag would reach 38323 seconds, which is greater than the maximum replication lag that was set (300 seconds). The current replication lag for the latest operation being replicated is 0 seconds.
209
+
- During asynchronous replication (replication lag greater than zero), the client tries to perform an operation on a service bus entity (queue, topic) or performs a management operation, but the operation can't be completed because the replication lag between the primary and the secondary regions has exceeded the maximum allowed replication lag in seconds.
210
+
-**Example**: The operation is being throttled because with it the new replication lag would reach 38,323 seconds, which is greater than the maximum replication lag that was set (300 seconds). The current replication lag for the latest operation being replicated is 0 seconds.
211
211
- The replication queue for an entity exceeds its maximum size in bytes. The maximum size in bytes for a replication queue is an internal limit set by Service Bus.
- In synchronous replication, a request times out while waiting for another request to replicate.
214
-
-**Example**: High volume of requests from client application for skarri-storage-exp1(westus3)/q1:MessagingJournal. Replication to other region(s) is in progress.
214
+
-**Example**: High volume of requests from client application for skarri-storage-exp1(westus3)/q1:MessagingJournal. Replication to other regions is in progress.
215
215
216
216
#### Resolution
217
217
@@ -221,26 +221,26 @@ We recommend that you follow these verification steps, depending on the type of
221
221
222
222
#### Cause
223
223
224
-
- A timeout exception in Geo DR means that the operation did not complete within the client-provided timeout.
224
+
- A timeout exception in Geo DR means that the operation didn't complete within the client-provided timeout.
225
225
- In synchronous replication, an operation’s primary region write and replication to secondary regions are within the scope of the operation’s timeout.
226
-
- In asynchronous replication, an operation’s primary region write is within the scope of the operation’s timeout, but an operation’s replication to secondary regions is not within the scope of the operation’s timeout.
227
-
-**Example**: The operation did not complete within the allocated time 00:01:00 for object message. (ServiceTimeout).
226
+
- In asynchronous replication, an operation’s primary region write is within the scope of the operation’s timeout, but an operation’s replication to secondary regions isn't within the scope of the operation’s timeout.
227
+
-**Example**: The operation didn't complete within the allocated time 00:01:00 for object message. (ServiceTimeout).
228
228
229
229
#### Resolution
230
230
231
231
- The client should retry the operation.
232
-
-Note that some steps of a timed-out operation may have been completed. It’s possible that a timed-out operation may have been written to the primary region and some secondary regions. If an operation has been written to the primary region, it will eventually be replicated to all secondary regions regardless of client timeout.
232
+
-Some steps of a timed-out operation might have been completed. It’s possible that a timed-out operation might have been written to the primary region and some secondary regions. If an operation has been written to the primary region, it will eventually be replicated to all secondary regions regardless of client timeout.
233
233
234
234
### BadRequest
235
235
236
236
#### Cause
237
237
238
-
- During a planned failover, the primary region is temporarily set as read-only in order to allow the secondary region to catch up. If the client attempts a write operation to the primary region while it is in this temporary read-only state, then the client will be receive a BadRequest exception.
238
+
- During a planned failover, the primary region is temporarily set as read-only in order to allow the secondary region to catch up. If the client attempts a write operation to the primary region while it is in this temporary read-only state, then the client receives a BadRequest exception.
239
239
-**Example**: Replication role switch in progress, primary replica:<entity-name> is ReadOnly.
240
240
241
241
#### Resolution
242
-
- The client must wait for planned failover to complete before write operations will succeed.
243
-
- In case planned failover takes too long, it is possible to trigger a forced failover instead.
242
+
- The client must wait for planned failover to complete before write operations succeed.
243
+
- In case planned failover takes too long, it's possible to trigger a forced failover instead.
0 commit comments