-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
The current strategy is to use a versioned bridge, which allows us to bridge between whatever version of OpenTelemetry API is brought by the user, and our embedded OpenTelemetry API version (see diagram and doc).
This is a solid approach, but it does require bridging back-and-forth, which is not free.
An alternative is to put the OpenTelemetry API (unshaded) into the bootstrap class loader, and force applications to use that version
- Up side: no bridging
- Down side: problems if application uses a different version of OpenTelemetry API
This down side has kept us away from this approach previously.
But let's explore it a bit further anyways 😄.
What if application is using a semver-compatible version of OpenTelemetry API? E.g. agent puts OpenTelemetry API 1.3 into the bootstrap class loader, and application uses OpenTelemetry API 1.1
This should work, trusting OpenTelemetry API to strictly follow semver.
Do we trust OpenTelemetry API to follow semver strictly? If this would unlock new possibilities for us, we should take advantage of the fact that we are developing both the API and the Javaagent under the same umbrella, and put whatever safeguards are needed into place on the API to ensure this.
Ok, but what if the application is using a newer version of OpenTelemetry API? E.g. agent puts OpenTelemetry API 1.1 into the bootstrap class loader, and the application uses OpenTelemetry API 1.3
In this case, we can identify that the class loader has a newer version of OpenTelemetry API, in which case our instrumentation isn't going to work with the newer version anyways, so we want to suppress our interop.
The open question here is: can we force that class loader to be parent last when it comes to OpenTelemetry API classes? (similar to how we currently force all class loaders to be "bootstrap first" when it comes to shaded OpenTelemetry API in order to deal with osgi-ish class loaders)
If we can do that, the the class loader won't use the OpenTelemetry API that the agent puts into the bootstrap class loader, and we won't break the application. We won't capture any telemetry from the newer OpenTelemetry API, but that wasn't going to happen anyways.
Ok, so what happens when OpenTelemetry API goes to 2.0, and the agent puts (unshaded) OpenTelemetry API 2.0 into the bootstrap class loader, and the application is still using OpenTelemetry API 1.x?
If we can figure out how to do the class loader trick above, then we can force the application to use OpenTelemetry API 1.x that it brought, and then apply the versioned bridge instrumentation to it.
I don't think we can escape the versioned bridge in this case.
Now, what if we go down this path and find out after GA that we missed something, and we need revert to the shaded OpenTelemetry API and versioned bridge in all cases? This change should not affect applications, unless the application fails to bring the OpenTelemetry API at all, and purely relies on the Javaagent bringing it. Which would be weird, as the application would fail to run without the Javaagent. But may be possible in environments where you can't (easily) run the code outside of the environment (and you package it with opentelemetry-api excluded for some reason). I'm not really sure how to guard against this without scanning all bytecode to see if it references OpenTelemetry API from inside of a class loader that didn't bring OpenTelemetry API, and not letting it access OpenTelemetry API in the bootstrap class loader (somehow, via ClassLoader.loadClass() instrumentation). I guess this scenario would also break when updating Javaagent from 1.x to 2.x if an application was relying on 1.x being in the bootstrap class loader (not bringing it's own OpenTelemetry API). Not sure this is reason not to go down this path, just trying to think though all the possibilities.