Skip to content

Conversation

@yiyuan-he
Copy link
Contributor

@yiyuan-he yiyuan-he commented May 27, 2025

What does this pull request do?

Replace hardcoded array index-based patching with a search-and-replace approach for updating ADOT instrumentation images in our EKS test deployments.

The solution uses jq to find the correct argument by pattern matching.

Problem

The current implementation uses hardcoded array indices to patch deployment arguments:

  • Java: args[2]
  • Python: args[3]
  • DotNet: args[4]
  • NodeJS: args[5]

This approach is fragile and will break if:

  • Arguments are reordered in the deployment
  • New arguments are added before the image arguments
  • The deployment structure changes

Which is what has happened here (notice new args were added in the deployment config).

Before

kubectl patch deploy ... --type='json' \
  -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args/2", "value": "--auto-instrumentation-java-image=..."}]'

After

kubectl get deploy ... -o json | \
  jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-java-image=") then "--auto-instrumentation-java-image=..." else . end)' | \
  kubectl apply -f -

Test strategy

1. Functional Testing with Real EKS Deployment

Retrieved actual deployment configuration from e2e-playground cluster and verified both approaches produce identical results:

# Both approaches successfully update the image:
OLD: --auto-instrumentation-java-image=TEST_JAVA:v1.0.0 ✓
NEW: --auto-instrumentation-java-image=TEST_JAVA:v1.0.0 ✓

# NEW approach only modifies the targeted argument:
--auto-instrumentation-java-image=TEST_JAVA:v1.0.0      ✓ Changed
--auto-instrumentation-python-image=...v0.9.0           ✓ Unchanged
--auto-instrumentation-dotnet-image=...v1.7.0           ✓ Unchanged
--auto-instrumentation-nodejs-image=...v0.6.0           ✓ Unchanged

2. Edge Case Testing Results

Tested five critical edge cases with the actual deployment configuration:

Non-existent argument patch

  • Test: Try to patch --auto-instrumentation-go-image (doesn't exist)
  • OLD approach: Would fail with index out of bounds
  • NEW approach: Safe no-op, no changes made

Reordered arguments

  • Test: Swapped Java and Python argument positions
  • OLD approach: Created duplicate Java entries, corrupted deployment
  • NEW approach: Correctly found and updated only the Java argument

New arguments inserted

  • Test: Added new flags before image arguments
  • OLD approach: Patched --new-feature-flag=enabled instead of Java image
  • NEW approach: Still correctly found and patched Java image

Sequential patches

  • Test: Applied multiple patches in sequence (simulating CI/CD)
  • Result: Both Java and Python successfully updated without conflicts

Malformed arguments

  • Test: Replaced Java arg with malformed string
  • OLD approach: Would blindly replace at index
  • NEW approach: No match found, safely skipped

3. Test Commands Used

# Get real deployment
kubectl get deploy -n amazon-cloudwatch amazon-cloudwatch-observability-controller-manager -o json > deployment.json

# Test transformation
cat deployment.json | \
  jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-java-image=") then "--auto-instrumentation-java-image=NEW_IMAGE" else . end)'

# Run comprehensive edge case tests
./test-edge-cases.sh

Test Files

Rollback procedure:

We can safely rollback these changes by reverting the commit.

Ensure you've run the following tests on your changes and include the link below:

To do so, create a test.yml file with name: Test and workflow description to test your changes, then remove the file for your PR. Link your test run in your PR description. This process is a short term solution while we work on creating a staging environment for testing.

NOTE: TESTS RUNNING ON A SINGLE EKS CLUSTER CANNOT BE RUN IN PARALLEL. See the needs keyword to run tests in succession.

  • Run Java EKS on e2e-playground in us-east-1 and eu-central-2
  • Run Python EKS on e2e-playground in us-east-1 and eu-central-2
  • Run metric limiter on EKS cluster e2e-playground in us-east-1 and eu-central-2
  • Run EC2 tests in all regions
  • Run K8s on a separate K8s cluster (check IAD test account for master node endpoints; these will change as we create and destroy clusters for OS patching)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@yiyuan-he yiyuan-he changed the title feat: Search-and-Replace Image Patching [WIP] feat: Search-and-Replace Image Patching May 27, 2025
@yiyuan-he yiyuan-he requested a review from majanjua-amzn May 27, 2025 16:22
@yiyuan-he yiyuan-he merged commit 745f6e9 into aws-observability:main May 27, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants