Skip to content

Conversation

yiyuan-he
Copy link
Contributor

What does this pull request do?

Bumps our OTel Dependency versions to 1.33.0/0.54b0 to support compatability with third-party AI Instrumentation libraries/frameworks such as OpenInference, Traceloop/Openllmetry, and OpenLit.

We do not bump to the latest upstream version 1.34.0/0.55b0 because that release includes BatchLogRecordProcessor refactoring which is not compatible with our Caton changes.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@yiyuan-he yiyuan-he requested a review from a team as a code owner June 9, 2025 21:11
@yiyuan-he yiyuan-he merged commit e997cf8 into aws-observability:main Jun 13, 2025
11 checks passed
@yiyuan-he yiyuan-he deleted the bump-otel-dependencies-version branch June 13, 2025 18:00
yiyuan-he added a commit that referenced this pull request Jun 13, 2025
yiyuan-he added a commit that referenced this pull request Jun 13, 2025
Reverts #388

## Why?
Bumping the OTel dependency versions is currently causing our main build
due to spans not being generated correctly. For example in an SNS call,
we see that `aws.local.service` is not being populated correctly:
```
{
    "name": "testTopic send",
    "context": {
        "trace_id": "0x684c92d9eecb9548c12f90342875a8f3",
        "span_id": "0xfd714402fb0429f9",
        "trace_state": "[]"
    },
    "kind": "SpanKind.PRODUCER",
    "parent_id": "0xa6868c3dde9d4839",
    "start_time": "2025-06-13T21:06:33.612183Z",
    "end_time": "2025-06-13T21:06:33.920669Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "rpc.system": "aws-api",
        "rpc.service": "SNS",
        "rpc.method": "Publish",
        "aws.region": "us-west-2",
        "server.address": "sns.us-west-2.amazonaws.com",
        "server.port": 443,
        "messaging.system": "aws.sns",
        "messaging.destination_kind": "topic",
        "messaging.destination": "arn:aws:sns:us-west-2:792479605405:testTopic",
        "messaging.destination.name": "arn:aws:sns:us-west-2:792479605405:testTopic",
        "aws.sns.topic.arn": "arn:aws:sns:us-west-2:792479605405:testTopic",
        "aws.request_id": "8184c44e-c6db-5998-a9d2-a48853c2dd94",
        "retry_attempts": 0,
        "http.status_code": 200
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.33.0",
            "service.name": "unknown_service",
            "cloud.provider": "aws",
            "cloud.platform": "aws_ec2",
            "cloud.account.id": "445567081046",
            "cloud.region": "us-east-1",
            "cloud.availability_zone": "us-east-1b",
            "host.id": "i-09dfcf17712adbde4",
            "host.type": "c5a.12xlarge",
            "host.name": "ip-172-31-43-64.ec2.internal",
            "telemetry.auto.version": "0.9.0.dev0-aws",
            "aws.local.service": "UnknownService"
        },
        "schema_url": ""
    }
}
{
    "name": "GET /server_request",
    "context": {
        "trace_id": "0x684c92d9eecb9548c12f90342875a8f3",
        "span_id": "0xa6868c3dde9d4839",
        "trace_state": "[]"
    },
    "kind": "SpanKind.SERVER",
    "parent_id": null,
    "start_time": "2025-06-13T21:06:33.610724Z",
    "end_time": "2025-06-13T21:06:33.920935Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "http.method": "GET",
        "http.server_name": "127.0.0.1",
        "http.scheme": "http",
        "net.host.name": "localhost:8082",
        "http.host": "localhost:8082",
        "net.host.port": 8082,
        "http.target": "/server_request?param=.%2Fsample-applications%2Fsimple-client-server%2Fclient.py",
        "net.peer.ip": "127.0.0.1",
        "net.peer.port": 34778,
        "http.user_agent": "python-requests/2.32.2",
        "http.flavor": "1.1",
        "http.route": "/server_request",
        "http.status_code": 200
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.33.0",
            "service.name": "unknown_service",
            "cloud.provider": "aws",
            "cloud.platform": "aws_ec2",
            "cloud.account.id": "445567081046",
            "cloud.region": "us-east-1",
            "cloud.availability_zone": "us-east-1b",
            "host.id": "i-09dfcf17712adbde4",
            "host.type": "c5a.12xlarge",
            "host.name": "ip-172-31-43-64.ec2.internal",
            "telemetry.auto.version": "0.9.0.dev0-aws",
            "aws.local.service": "UnknownService"
        },
        "schema_url": ""
    }
}
```

Previously these contract tests were passing in the PR build as well as
locally with these dependency version bumps so we are not sure why they
are failing all of a sudden. As a short-term mitigation, we will revert
these changes as we investigate further.
yiyuan-he added a commit that referenced this pull request Jun 16, 2025
…Setup (#398)

## What does this pull request do?
Fixes an issue where
[upgrading](#388)
our OTel dependency version from 1.27.0 caused all of our contract tests
to start
[failing](https://github.com/aws-observability/aws-otel-python-instrumentation/actions/runs/15640951584/job/44067918087)
in the main build.

The root cause was that in version
[1.28.0](https://github.com/open-telemetry/opentelemetry-python-contrib/releases/tag/v0.49b0)
OpenTelemetry Python SDK migrated from `pkg_resources` to
`importlib_metadata` for entry point discovery. This was a [breaking
change](open-telemetry/opentelemetry-python-contrib#2871)
that had significant behavioral implications:
- **Before (pkg_resources):** Entry points were discovered in `sys.path`
order, meaing packages installed in the local test environment (e.g.
venv) were always prioritized. This made ADOT discovery predictable and
consistent even without explicitly specifying `OTEL_PYTHON_DISTRO` and
`OTEL_PYTHON_CONFIGURATOR` in the contract test set up.
- **After (importlib_metadata):** Entry points are discovered using an
implementation ordering that doesn't guarantee `sys.path` precedence. In
short, the discovery order depends on factors like filesystem iteration
order, installation timestamps, etc. - things that can vary between
environments. This is why our contract tests were able to pass in
original PR build to bump the OTel dependencies, but then started
failing in our main build.

Due to this unpredicatable ordering, our ADOT SDK was not able to
instrument the sample apps in our contract tests correctly which then
resulted in all the test assertions failing.

The solution is to explicitly configure the OpenTelemetry distro and
configurator in our contract test set up. This approach follows
OpenTelemetry's [official
recommendations](https://pypi.org/project/opentelemetry-instrumentation/)
when multiple distros are present.
> If you have entry points for multiple distros or configurators present
in your environment, you should specify the entry point name of the
distro and configurator you want to be used via the OTEL_PYTHON_DISTRO
and OTEL_PYTHON_CONFIGURATOR environment variables.

**This fix will enable us to safely upgrade our OTel dependency version
from 1.27.0 which unblocks the Caton project.**


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
yiyuan-he added a commit to yiyuan-he/aws-otel-python-instrumentation that referenced this pull request Jun 16, 2025
## What does this pull request do?
Bumps our OTel Dependency versions to
[1.33.0/0.54b0](https://github.com/open-telemetry/opentelemetry-python/releases/tag/v1.33.0)
to support compatability with third-party AI Instrumentation
libraries/frameworks such as OpenInference, Traceloop/Openllmetry, and
OpenLit.

We do not bump to the latest upstream version
[1.34.0/0.55b0](https://github.com/open-telemetry/opentelemetry-python/releases/tag/v1.34.0)
because that release includes `BatchLogRecordProcessor` refactoring
which is not compatible with our Caton changes.


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
yiyuan-he added a commit to yiyuan-he/aws-otel-python-instrumentation that referenced this pull request Jun 16, 2025
…ility#397)

Reverts aws-observability#388

## Why?
Bumping the OTel dependency versions is currently causing our main build
due to spans not being generated correctly. For example in an SNS call,
we see that `aws.local.service` is not being populated correctly:
```
{
    "name": "testTopic send",
    "context": {
        "trace_id": "0x684c92d9eecb9548c12f90342875a8f3",
        "span_id": "0xfd714402fb0429f9",
        "trace_state": "[]"
    },
    "kind": "SpanKind.PRODUCER",
    "parent_id": "0xa6868c3dde9d4839",
    "start_time": "2025-06-13T21:06:33.612183Z",
    "end_time": "2025-06-13T21:06:33.920669Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "rpc.system": "aws-api",
        "rpc.service": "SNS",
        "rpc.method": "Publish",
        "aws.region": "us-west-2",
        "server.address": "sns.us-west-2.amazonaws.com",
        "server.port": 443,
        "messaging.system": "aws.sns",
        "messaging.destination_kind": "topic",
        "messaging.destination": "arn:aws:sns:us-west-2:792479605405:testTopic",
        "messaging.destination.name": "arn:aws:sns:us-west-2:792479605405:testTopic",
        "aws.sns.topic.arn": "arn:aws:sns:us-west-2:792479605405:testTopic",
        "aws.request_id": "8184c44e-c6db-5998-a9d2-a48853c2dd94",
        "retry_attempts": 0,
        "http.status_code": 200
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.33.0",
            "service.name": "unknown_service",
            "cloud.provider": "aws",
            "cloud.platform": "aws_ec2",
            "cloud.account.id": "445567081046",
            "cloud.region": "us-east-1",
            "cloud.availability_zone": "us-east-1b",
            "host.id": "i-09dfcf17712adbde4",
            "host.type": "c5a.12xlarge",
            "host.name": "ip-172-31-43-64.ec2.internal",
            "telemetry.auto.version": "0.9.0.dev0-aws",
            "aws.local.service": "UnknownService"
        },
        "schema_url": ""
    }
}
{
    "name": "GET /server_request",
    "context": {
        "trace_id": "0x684c92d9eecb9548c12f90342875a8f3",
        "span_id": "0xa6868c3dde9d4839",
        "trace_state": "[]"
    },
    "kind": "SpanKind.SERVER",
    "parent_id": null,
    "start_time": "2025-06-13T21:06:33.610724Z",
    "end_time": "2025-06-13T21:06:33.920935Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "http.method": "GET",
        "http.server_name": "127.0.0.1",
        "http.scheme": "http",
        "net.host.name": "localhost:8082",
        "http.host": "localhost:8082",
        "net.host.port": 8082,
        "http.target": "/server_request?param=.%2Fsample-applications%2Fsimple-client-server%2Fclient.py",
        "net.peer.ip": "127.0.0.1",
        "net.peer.port": 34778,
        "http.user_agent": "python-requests/2.32.2",
        "http.flavor": "1.1",
        "http.route": "/server_request",
        "http.status_code": 200
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.33.0",
            "service.name": "unknown_service",
            "cloud.provider": "aws",
            "cloud.platform": "aws_ec2",
            "cloud.account.id": "445567081046",
            "cloud.region": "us-east-1",
            "cloud.availability_zone": "us-east-1b",
            "host.id": "i-09dfcf17712adbde4",
            "host.type": "c5a.12xlarge",
            "host.name": "ip-172-31-43-64.ec2.internal",
            "telemetry.auto.version": "0.9.0.dev0-aws",
            "aws.local.service": "UnknownService"
        },
        "schema_url": ""
    }
}
```

Previously these contract tests were passing in the PR build as well as
locally with these dependency version bumps so we are not sure why they
are failing all of a sudden. As a short-term mitigation, we will revert
these changes as we investigate further.
yiyuan-he added a commit to yiyuan-he/aws-otel-python-instrumentation that referenced this pull request Jun 16, 2025
…Setup (aws-observability#398)

## What does this pull request do?
Fixes an issue where
[upgrading](aws-observability#388)
our OTel dependency version from 1.27.0 caused all of our contract tests
to start
[failing](https://github.com/aws-observability/aws-otel-python-instrumentation/actions/runs/15640951584/job/44067918087)
in the main build.

The root cause was that in version
[1.28.0](https://github.com/open-telemetry/opentelemetry-python-contrib/releases/tag/v0.49b0)
OpenTelemetry Python SDK migrated from `pkg_resources` to
`importlib_metadata` for entry point discovery. This was a [breaking
change](open-telemetry/opentelemetry-python-contrib#2871)
that had significant behavioral implications:
- **Before (pkg_resources):** Entry points were discovered in `sys.path`
order, meaing packages installed in the local test environment (e.g.
venv) were always prioritized. This made ADOT discovery predictable and
consistent even without explicitly specifying `OTEL_PYTHON_DISTRO` and
`OTEL_PYTHON_CONFIGURATOR` in the contract test set up.
- **After (importlib_metadata):** Entry points are discovered using an
implementation ordering that doesn't guarantee `sys.path` precedence. In
short, the discovery order depends on factors like filesystem iteration
order, installation timestamps, etc. - things that can vary between
environments. This is why our contract tests were able to pass in
original PR build to bump the OTel dependencies, but then started
failing in our main build.

Due to this unpredicatable ordering, our ADOT SDK was not able to
instrument the sample apps in our contract tests correctly which then
resulted in all the test assertions failing.

The solution is to explicitly configure the OpenTelemetry distro and
configurator in our contract test set up. This approach follows
OpenTelemetry's [official
recommendations](https://pypi.org/project/opentelemetry-instrumentation/)
when multiple distros are present.
> If you have entry points for multiple distros or configurators present
in your environment, you should specify the entry point name of the
distro and configurator you want to be used via the OTEL_PYTHON_DISTRO
and OTEL_PYTHON_CONFIGURATOR environment variables.

**This fix will enable us to safely upgrade our OTel dependency version
from 1.27.0 which unblocks the Caton project.**


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants