-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Describe the scenario
Let's consider the typical API policy with a backend, here as the effective policy to highlight the forward-request:
<policies>
<inbound>
<base />
<set-backend-service backend-id="waitingfunction-backend" />
</inbound>
<backend>
<forward-request timeout="60" />
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>If an API has Application Insights enabled with W3C correlation protocol selected, it is expected that:
- if client provides a traceparent header, APIM:
- reports the INBOUND request in AppInsights with the received traceparent
- reports the dependency to BACKEND with a traceparent with a new spanId.
- downstream service receive the traceparent ... correlation success
- if client DOES NOT provide a traceparent header, APIM:
- reports the INBOUND request with a brand new generated traceparent
- same as before
In both cases, in Application Insights it looks like this:

What happens if the forward-request and/or additional send-request in the Backend section are performed inside a wait policy? Correlation is broken by APIM.
The API policy for this scenario is something like this, payload and other logic omitted for clarity:
<policies>
<inbound>
<base />
<set-backend-service backend-id="waitingfunction-backend" />
</inbound>
<backend>
<wait for="all">
<forward-request timeout="60" />
<send-request mode="copy" timeout="60" ignore-error="false">
<set-url>https://same.backend.as.forward.request</set-url>
<set-method>POST</set-method>
<set-header name="Content-Type" exists-action="override">
<value>application/json</value>
</set-header>
</send-request>
</wait>
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>Inspecting the Application Insights request etc.. for this scenario, I notice that APIM breaks the distributed tracing into two correlation groups:
- if client DOES NOT provide a traceparent header, APIM splits :
- reports the INBOUND request with a new traceparent
- reports the dependency to BACKEND with a traceparent with a new spanId.
- no traceparent is passed to the downstream service configured as backend --- correlation is broken
- if client provides a traceparent header, APIM:
- reports the request in AppInsights with the received traceparent
- reports the dependency to BACKEND with a traceparent with a new spanId.
- something breaks in the traceparent passed to the downstream service --- the traceId is still the same but the dependency is broken.
As a comparison with previous scenario, this is Application Insights with wait policy if client provides a traceparent header. In my tests the service reached by the forward-request and send-request is the same, therefore you can see two distinct requests.
Please note how the request, as reported by downstream service, originated in the forward-request (the one lasting 1s) does not look depending on the BACKEND dependency, and is placed as the same level of the request originated in the send-request.

This is Application Insights with wait policy if client DOES NOT provides a traceparent header: here shown only the transaction in APIM for the inbound request and the BACKEND dependency. The two downstream service requests have their own traceparent as none is passed by APIM.

Improvement to Project
There is not a clear rational on why the wait should change the default correlation behaviour with Application Insights.
The potential solution is to explicitly control how the traceparent is handled, especially how it passed to downstream services by forward-request and send-request in wait policy.
A first step could be a policy fragment that makes sure a traceparent is present if not, changes the span if provided by client:
<fragment>
<set-variable name="traceparent" value="@{
var traceparent = context.Request.Headers.GetValueOrDefault("traceparent", "");
if (string.IsNullOrEmpty(traceparent)) {
// Generate a completely new traceparent if none exists
var version = "00";
var traceId = Guid.NewGuid().ToString("N").Substring(0, 32);
var spanId = Guid.NewGuid().ToString("N").Substring(0, 16);
var flags = "01"; // Sampled
traceparent = $"{version}-{traceId}-{spanId}-{flags}";
} else {
// Parse existing traceparent and generate new span ID
var parts = traceparent.Split('-');
if (parts.Length == 4) {
var version = parts[0];
var traceId = parts[1];
var newSpanId = Guid.NewGuid().ToString("N").Substring(0, 16);
var flags = parts[3];
traceparent = $"{version}-{traceId}-{newSpanId}-{flags}";
}
// If parsing fails, keep the original traceparent
}
return traceparent;}" />
<set-variable name="tracestate" value="@(context.Request.Headers.GetValueOrDefault("tracestate", ""))" />
</fragment>The corresponding policy to the send-request and forward-request in wait policy in backend section should be something like:
<policies>
<inbound>
<base />
<include-fragment fragment-id="traceparent_initialize" />
<set-header name="traceparent" exists-action="override">
<value>@(context.Variables.GetValueOrDefault<string>("traceparent"))</value>
</set-header>
<set-header name="tracestate" exists-action="override">
<value>@(context.Variables.GetValueOrDefault<string>("tracestate"))</value>
</set-header>
<set-backend-service backend-id="waitingfunction-backend" />
</inbound>
<backend>
<wait for="all">
<forward-request timeout="60" />
<send-request mode="copy" response-variable-name="sendreq1res" timeout="60" ignore-error="false">
<set-url>https://same.backend.as.forward.request</set-url>
<set-method>POST</set-method>
<set-header name="Content-Type" exists-action="override">
<value>application/json</value>
</set-header>
<!-- Propagate W3C correlation headers -->
<set-header name="traceparent" exists-action="override">
<value>@(context.Variables.GetValueOrDefault<string>("traceparent"))</value>
</set-header>
<set-header name="tracestate" exists-action="override">
<value>@(context.Variables.GetValueOrDefault<string>("tracestate"))</value>
</set-header>
<set-body>...</set-body>
</send-request>
</wait>
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>This approach partially works when a traceparent is passed by client.
This is what Application Insights looks like for the requests to APIM if client provides a traceparent header, eventhough the downstream service request does not have the proper parent:

It does not work when the cliente DOES NOT pass traceparent, i.e. it starts in APIM: the APIM request and dependency appear to be from a different traceparent than the one I created: probably APIM does not honor a traceparent set while processing inbound already.
If not solved this makes even more clear the incongruency of having a different behavior with wait policy.
Are you able to collaborate and/or submit a pull request?
Yes