You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## What does this pull request do?
Replace hardcoded array index-based patching with a search-and-replace
approach for updating ADOT instrumentation images in our EKS test
deployments.
The solution uses `jq` to find the correct argument by pattern matching.
### Problem
The current implementation uses hardcoded array indices to patch
deployment arguments:
- Java: `args[2]`
- Python: `args[3]`
- DotNet: `args[4]`
- NodeJS: `args[5]`
This approach is fragile and will break if:
- Arguments are reordered in the deployment
- New arguments are added before the image arguments
- The deployment structure changes
Which is what has happened
[here](https://github.com/aws-observability/aws-otel-java-instrumentation/actions/runs/15219049376/job/42813192472#step:26:316)
(notice new args were added in the deployment config).
### Before
```bash
kubectl patch deploy ... --type='json' \
-p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args/2", "value": "--auto-instrumentation-java-image=..."}]'
```
### After
```bash
kubectl get deploy ... -o json | \
jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-java-image=") then "--auto-instrumentation-java-image=..." else . end)' | \
kubectl apply -f -
```
## Test strategy
### 1. Functional Testing with Real EKS Deployment
Retrieved actual deployment configuration from e2e-playground cluster
and verified both approaches produce identical results:
```bash
# Both approaches successfully update the image:
OLD: --auto-instrumentation-java-image=TEST_JAVA:v1.0.0 ✓
NEW: --auto-instrumentation-java-image=TEST_JAVA:v1.0.0 ✓
# NEW approach only modifies the targeted argument:
--auto-instrumentation-java-image=TEST_JAVA:v1.0.0 ✓ Changed
--auto-instrumentation-python-image=...v0.9.0 ✓ Unchanged
--auto-instrumentation-dotnet-image=...v1.7.0 ✓ Unchanged
--auto-instrumentation-nodejs-image=...v0.6.0 ✓ Unchanged
```
### 2. Edge Case Testing Results
Tested five critical edge cases with the actual deployment
configuration:
#### Non-existent argument patch
- **Test**: Try to patch `--auto-instrumentation-go-image` (doesn't
exist)
- **OLD approach**: Would fail with index out of bounds
- **NEW approach**: Safe no-op, no changes made
#### Reordered arguments
- **Test**: Swapped Java and Python argument positions
- **OLD approach**: Created duplicate Java entries, corrupted deployment
- **NEW approach**: Correctly found and updated only the Java argument
#### New arguments inserted
- **Test**: Added new flags before image arguments
- **OLD approach**: Patched `--new-feature-flag=enabled` instead of Java
image
- **NEW approach**: Still correctly found and patched Java image
#### Sequential patches
- **Test**: Applied multiple patches in sequence (simulating CI/CD)
- **Result**: Both Java and Python successfully updated without
conflicts
#### Malformed arguments
- **Test**: Replaced Java arg with malformed string
- **OLD approach**: Would blindly replace at index
- **NEW approach**: No match found, safely skipped
### 3. Test Commands Used
```bash
# Get real deployment
kubectl get deploy -n amazon-cloudwatch amazon-cloudwatch-observability-controller-manager -o json > deployment.json
# Test transformation
cat deployment.json | \
jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-java-image=") then "--auto-instrumentation-java-image=NEW_IMAGE" else . end)'
# Run comprehensive edge case tests
./test-edge-cases.sh
```
### Test Files
-
[test-real-deployment.sh](https://paste.amazon.com/show/yiyuanh/1748359780)
- Testing with actual deployment configuration
- [test-edge-cases.sh](https://paste.amazon.com/show/yiyuanh/1748359814)
- Comprehensive edge case testing on real deployment
*Rollback procedure:*
We can safely rollback these changes by reverting the commit.
*Ensure you've run the following tests on your changes and include the
link below:*
To do so, create a `test.yml` file with `name: Test` and workflow
description to test your changes, then remove the file for your PR. Link
your test run in your PR description. This process is a short term
solution while we work on creating a staging environment for testing.
NOTE: TESTS RUNNING ON A SINGLE EKS CLUSTER CANNOT BE RUN IN PARALLEL.
See the
[needs](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idneeds)
keyword to run tests in succession.
- Run Java EKS on `e2e-playground` in us-east-1 and eu-central-2
- Run Python EKS on `e2e-playground` in us-east-1 and eu-central-2
- Run metric limiter on EKS cluster `e2e-playground` in us-east-1 and
eu-central-2
- Run EC2 tests in all regions
- Run K8s on a separate K8s cluster (check IAD test account for master
node endpoints; these will change as we create and destroy clusters for
OS patching)
By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
0 commit comments