Skip to content

Commit c4620b2

Browse files
Merge pull request #292213 from spelluru/kafkaseo1218
Service Bus Freshness - 1219
2 parents 6489fe2 + 6d46ae2 commit c4620b2

File tree

4 files changed

+40
-43
lines changed

4 files changed

+40
-43
lines changed

articles/service-bus-messaging/service-bus-end-to-end-tracing.md

Lines changed: 21 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,26 @@
11
---
2-
title: Azure Service Bus end-to-end tracing and diagnostics | Microsoft Docs
2+
title: End-to-end tracing and diagnostics
33
description: Overview of Service Bus client diagnostics and end-to-end tracing (client through all the services that are involved in processing.)
4-
ms.topic: article
5-
ms.date: 12/21/2022
4+
ms.topic: concept-article
5+
ms.date: 12/19/2024
66
ms.devlang: csharp
7-
ms.custom: devx-track-csharp, devx-track-dotnet
7+
ms.custom: devx-track-csharp, devx-track-dotnet"
8+
# Customer intent: I want o learn how to trace operations from a client through all the services that are involving in processing.
89
---
910

1011
# Distributed tracing and correlation through Service Bus messaging
1112

12-
One of the common problems in micro services development is the ability to trace operation from a client through all the services that are involved in processing. It's useful for debugging, performance analysis, A/B testing, and other typical diagnostics scenarios.
13-
One part of this problem is tracking logical pieces of work. It includes message processing result and latency and external dependency calls. Another part is correlation of these diagnostics events beyond process boundaries.
13+
One of the common problems in micro services development is the ability to trace operation from a client through all the services that are involved in processing. It's useful for debugging, performance analysis, A/B testing, and other typical diagnostics scenarios. One part of this problem is tracking logical pieces of work, which includes message processing result and latency and external dependency calls. Another part is correlation of these diagnostics events beyond process boundaries.
1414

15-
When a producer sends a message through a queue, it typically happens in the scope of some other logical operation, initiated by some other client or service. The same operation is continued by consumer once it receives a message. Both producer and consumer (and other services that process the operation), presumably emit telemetry events to trace the operation flow and result. In order to correlate such events and trace operation end-to-end, each service that reports telemetry has to stamp every event with a trace context. One library that can help developers have all of this telemetry emitted by default is [NServiceBus](https://docs.particular.net/nservicebus/operations/opentelemetry).
15+
When a producer sends a message through a queue, it typically happens in the scope of some other logical operation, initiated by some other client or service. The same operation is continued by consumer once it receives a message. Both producer and consumer (and other services that process the operation), presumably emit telemetry events to trace the operation flow and result. In order to correlate such events and trace operation end-to-end, each service that reports telemetry has to stamp every event with a trace context.
1616

1717
Microsoft Azure Service Bus messaging has defined payload properties that producers and consumers should use to pass such trace context.
1818
The protocol is based on the [W3C Trace-Context](https://www.w3.org/TR/trace-context/).
1919

2020
# [Azure.Messaging.ServiceBus SDK (Latest)](#tab/net-standard-sdk-2)
2121
| Property Name | Description |
2222
|----------------------|-------------------------------------------------------------|
23-
| Diagnostic-Id | Unique identifier of an external call from producer to the queue. Refer to [W3C Trace-Context traceparent header](https://www.w3.org/TR/trace-context/#traceparent-header) for the format |
23+
| `Diagnostic-Id` | Unique identifier of an external call from producer to the queue. Refer to [W3C Trace-Context trace parent header](https://www.w3.org/TR/trace-context/#traceparent-header) for the format |
2424

2525
## Service Bus .NET Client autotracing
2626
The `ServiceBusProcessor` class of [Azure Messaging Service Bus client for .NET](/dotnet/api/azure.messaging.servicebus.servicebusprocessor) provides tracing instrumentation points that can be hooked by tracing systems, or piece of client code. The instrumentation allows tracking all calls to the Service Bus messaging service from client side. If message processing is done by using [`ProcessMessageAsync` of `ServiceBusProcessor`](/dotnet/api/azure.messaging.servicebus.servicebusprocessor.processmessageasync) (message handler pattern), the message processing is also instrumented.
@@ -81,7 +81,7 @@ It doesn't mean that there was a delay in receiving the message. In this scenari
8181
Service Bus .NET Client library version 7.5.0 and later supports OpenTelemetry in experimental mode. For more information, see [Distributed tracing in .NET SDK](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/core/Azure.Core/samples/Diagnostics.md#opentelemetry-with-azure-monitor-zipkin-and-others).
8282

8383
### Tracking without tracing system
84-
In case your tracing system doesn't support automatic Service Bus calls tracking you may be looking into adding such support into a tracing system or into your application. This section describes diagnostics events sent by Service Bus .NET client.
84+
In case your tracing system doesn't support automatic Service Bus calls tracking you might be looking into adding such support into a tracing system or into your application. This section describes diagnostics events sent by Service Bus .NET client.
8585

8686
Service Bus .NET Client is instrumented using .NET tracing primitives [System.Diagnostics.Activity](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/ActivityUserGuide.md) and [System.Diagnostics.DiagnosticSource](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/DiagnosticSourceUsersGuide.md).
8787

@@ -172,7 +172,7 @@ Here's the full list of instrumented operations:
172172
In some cases, it's desirable to log only part of the events to reduce performance overhead or storage consumption. You could log 'Stop' events only (as in preceding example) or sample percentage of the events.
173173
`DiagnosticSource` provide way to achieve it with `IsEnabled` predicate. For more information, see [Context-Based Filtering in DiagnosticSource](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/DiagnosticSourceUsersGuide.md#context-based-filtering).
174174

175-
`IsEnabled` may be called multiple times for a single operation to minimize performance impact.
175+
`IsEnabled` might be called multiple times for a single operation to minimize performance impact.
176176

177177
`IsEnabled` is called in following sequence:
178178

@@ -198,8 +198,8 @@ In presence of multiple `DiagnosticSource` listeners for the same source, it's e
198198

199199
| Property Name | Description |
200200
|----------------------|-------------------------------------------------------------|
201-
| Diagnostic-Id | Unique identifier of an external call from producer to the queue. Refer to [Request-Id in HTTP protocol](https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/HttpCorrelationProtocol.md#request-id) for the rationale, considerations, and format |
202-
| Correlation-Context | Operation context, which is propagated across all services involved in operation processing. For more information, see [Correlation-Context in HTTP protocol](https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/HttpCorrelationProtocol.md#correlation-context) |
201+
| `Diagnostic-Id` | Unique identifier of an external call from producer to the queue. Refer to [Request-Id in HTTP protocol](https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/HttpCorrelationProtocol.md#request-id) for the rationale, considerations, and format |
202+
| `Correlation-Context` | Operation context, which is propagated across all services involved in operation processing. For more information, see [Correlation-Context in HTTP protocol](https://github.com/dotnet/runtime/blob/master/src/libraries/System.Diagnostics.DiagnosticSource/src/HttpCorrelationProtocol.md#correlation-context) |
203203

204204
## Service Bus .NET Client autotracing
205205

@@ -258,7 +258,7 @@ If you're running any external code in addition to the Application Insights SDK,
258258
It doesn't mean that there was a delay in receiving the message. In this scenario, the message has already been received since the message is passed in as a parameter to the SDK code. And, the **name** tag in the App Insights logs (**Process**) indicates that the message is now being processed by your external event processing code. This issue isn't Azure-related. Instead, these metrics refer to the efficiency of your external code given that the message has already been received from Service Bus. See [this file on GitHub](https://github.com/Azure/azure-sdk-for-net/blob/4bab05144ce647cc9e704d46d3763de5f9681ee0/sdk/servicebus/Microsoft.Azure.ServiceBus/src/ServiceBusDiagnosticsSource.cs) to see where the **Process** tag is generated and assigned once the message has been received from Service Bus.
259259

260260
### Tracking without tracing system
261-
In case your tracing system doesn't support automatic Service Bus calls tracking you may be looking into adding such support into a tracing system or into your application. This section describes diagnostics events sent by Service Bus .NET client.
261+
In case your tracing system doesn't support automatic Service Bus calls tracking you might be looking into adding such support into a tracing system or into your application. This section describes diagnostics events sent by Service Bus .NET client.
262262

263263
Service Bus .NET Client is instrumented using .NET tracing primitives [System.Diagnostics.Activity](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/ActivityUserGuide.md) and [System.Diagnostics.DiagnosticSource](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/DiagnosticSourceUsersGuide.md).
264264

@@ -313,10 +313,10 @@ In this example, listener logs duration, result, unique identifier, and start ti
313313

314314
#### Events
315315

316-
For every operation, two events are sent: 'Start' and 'Stop'.
317-
Most probably, you're only interested in 'Stop' events. They provide the result of operation, and start time and duration as Activity properties.
316+
For every operation, two events are sent: Start and Stop.
317+
Most probably, you're only interested in Stop events. They provide the result of operation, and start time and duration as Activity properties.
318318

319-
Event payload provides a listener with the context of the operation, it replicates API incoming parameters and return value. 'Stop' event payload has all the properties of 'Start' event payload, so you can ignore 'Start' event completely.
319+
Event payload provides a listener with the context of the operation. It replicates API incoming parameters and return value. 'Stop' event payload has all the properties of 'Start' event payload, so you can ignore 'Start' event completely.
320320

321321
All events also have 'Entity' and 'Endpoint' properties.
322322
* `string Entity` - - Name of the entity (queue, topic, etc.)
@@ -357,8 +357,8 @@ In every event, you can access `Activity.Current` that holds current operation c
357357
`Activity.Current` provides detailed context of current operation and its parents. For more information, see [Activity documentation](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/ActivityUserGuide.md).
358358
Service Bus instrumentation provides more information in the `Activity.Current.Tags` - they hold `MessageId` and `SessionId` whenever they're available.
359359

360-
Activities that track 'Receive', 'Peek' and 'ReceiveDeferred' event also may have `RelatedTo` tag. It holds distinct list of `Diagnostic-Id`(s) of messages that were received as a result.
361-
Such operation may result in several unrelated messages to be received. Also, the `Diagnostic-Id` isn't known when operation starts, so 'Receive' operations could be correlated to 'Process' operations using this Tag only. It's useful when analyzing performance issues to check how long it took to receive the message.
360+
Activities that track Receive, Peek, and ReceiveDeferred event also might have `RelatedTo` tag. It holds distinct list of `Diagnostic-Id`(s) of messages that were received as a result.
361+
Such operation might result in several unrelated messages to be received. Also, the `Diagnostic-Id` isn't known when operation starts, so 'Receive' operations could be correlated to 'Process' operations using this Tag only. It's useful when analyzing performance issues to check how long it took to receive the message.
362362

363363
Efficient way to log Tags is to iterate over them, so adding Tags to the preceding example looks like
364364

@@ -380,7 +380,7 @@ serviceBusLogger.LogInformation($"{currentActivity.OperationName} is finished, D
380380
In some cases, it's desirable to log only part of the events to reduce performance overhead or storage consumption. You could log 'Stop' events only (as in preceding example) or sample percentage of the events.
381381
`DiagnosticSource` provide way to achieve it with `IsEnabled` predicate. For more information, see [Context-Based Filtering in DiagnosticSource](https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/DiagnosticSourceUsersGuide.md#context-based-filtering).
382382

383-
`IsEnabled` may be called multiple times for a single operation to minimize performance impact.
383+
`IsEnabled` might be called multiple times for a single operation to minimize performance impact.
384384

385385
`IsEnabled` is called in following sequence:
386386

@@ -400,8 +400,7 @@ In presence of multiple `DiagnosticSource` listeners for the same source, it's e
400400

401401
---
402402

403-
## Next steps
404-
403+
## Related content
404+
* One library that can help developers have the telemetry emitted by default is [NServiceBus](https://docs.particular.net/nservicebus/operations/opentelemetry).
405405
* [Application Insights Correlation](/azure/azure-monitor/app/distributed-tracing-telemetry-correlation)
406-
* [Application Insights Monitor Dependencies](/azure/azure-monitor/app/asp-net-dependencies) to see if REST, SQL, or other external resources are slowing you down.
407406
* [Track custom operations with Application Insights .NET SDK](/azure/azure-monitor/app/custom-operations-tracking)

articles/service-bus-messaging/service-bus-messaging-exceptions-latest.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Azure Service Bus - messaging exceptions | Microsoft Docs
33
description: This article provides a list of Azure Service Bus messaging exceptions and suggested actions to taken when the exception occurs.
44
ms.topic: article
55
ms.custom: devx-track-dotnet
6-
ms.date: 02/17/2023
6+
ms.date: 12/19/2024
77
---
88

99
# Service Bus messaging exceptions (.NET)
@@ -206,12 +206,12 @@ We recommend that you follow these verification steps, depending on the type of
206206

207207
#### Causes
208208

209-
- During asynchronous replication (replication lag greater than zero), the client tries to perform an operation on a service bus entity (queue, topic) or performs a management operation, but the operation cannot be completed because the replication lag between the primary and the secondary regions has exceeded the maximum allowed replication lag in seconds.
210-
- **Example**: The operation is being throttled because with it the new replication lag would reach 38323 seconds, which is greater than the maximum replication lag that was set (300 seconds). The current replication lag for the latest operation being replicated is 0 seconds.
209+
- During asynchronous replication (replication lag greater than zero), the client tries to perform an operation on a service bus entity (queue, topic) or performs a management operation, but the operation can't be completed because the replication lag between the primary and the secondary regions has exceeded the maximum allowed replication lag in seconds.
210+
- **Example**: The operation is being throttled because with it the new replication lag would reach 38,323 seconds, which is greater than the maximum replication lag that was set (300 seconds). The current replication lag for the latest operation being replicated is 0 seconds.
211211
- The replication queue for an entity exceeds its maximum size in bytes. The maximum size in bytes for a replication queue is an internal limit set by Service Bus.
212212
- **Example**: Replication queue size 73128000 exceeded threshold 67108864.
213213
- In synchronous replication, a request times out while waiting for another request to replicate.
214-
- **Example**: High volume of requests from client application for skarri-storage-exp1(westus3)/q1:MessagingJournal. Replication to other region(s) is in progress.
214+
- **Example**: High volume of requests from client application for skarri-storage-exp1(westus3)/q1:MessagingJournal. Replication to other regions is in progress.
215215

216216
#### Resolution
217217

@@ -221,26 +221,26 @@ We recommend that you follow these verification steps, depending on the type of
221221

222222
#### Cause
223223

224-
- A timeout exception in Geo DR means that the operation did not complete within the client-provided timeout.
224+
- A timeout exception in Geo DR means that the operation didn't complete within the client-provided timeout.
225225
- In synchronous replication, an operation’s primary region write and replication to secondary regions are within the scope of the operation’s timeout.
226-
- In asynchronous replication, an operation’s primary region write is within the scope of the operation’s timeout, but an operation’s replication to secondary regions is not within the scope of the operation’s timeout.
227-
- **Example**: The operation did not complete within the allocated time 00:01:00 for object message. (ServiceTimeout).
226+
- In asynchronous replication, an operation’s primary region write is within the scope of the operation’s timeout, but an operation’s replication to secondary regions isn't within the scope of the operation’s timeout.
227+
- **Example**: The operation didn't complete within the allocated time 00:01:00 for object message. (ServiceTimeout).
228228

229229
#### Resolution
230230

231231
- The client should retry the operation.
232-
- Note that some steps of a timed-out operation may have been completed. It’s possible that a timed-out operation may have been written to the primary region and some secondary regions. If an operation has been written to the primary region, it will eventually be replicated to all secondary regions regardless of client timeout.
232+
- Some steps of a timed-out operation might have been completed. It’s possible that a timed-out operation might have been written to the primary region and some secondary regions. If an operation has been written to the primary region, it will eventually be replicated to all secondary regions regardless of client timeout.
233233

234234
### BadRequest
235235

236236
#### Cause
237237

238-
- During a planned failover, the primary region is temporarily set as read-only in order to allow the secondary region to catch up. If the client attempts a write operation to the primary region while it is in this temporary read-only state, then the client will be receive a BadRequest exception.
238+
- During a planned failover, the primary region is temporarily set as read-only in order to allow the secondary region to catch up. If the client attempts a write operation to the primary region while it is in this temporary read-only state, then the client receives a BadRequest exception.
239239
- **Example**: Replication role switch in progress, primary replica:<entity-name> is ReadOnly.
240240

241241
#### Resolution
242-
- The client must wait for planned failover to complete before write operations will succeed.
243-
- In case planned failover takes too long, it is possible to trigger a forced failover instead.
242+
- The client must wait for planned failover to complete before write operations succeed.
243+
- In case planned failover takes too long, it's possible to trigger a forced failover instead.
244244

245245
## Next steps
246246

0 commit comments

Comments
 (0)