feat: Search-and-Replace Image Patching (#409)

yiyuan-he · web-flow · commit 745f6e9925be · 2025-05-27T10:08:03.000-07:00
## What does this pull request do? Replace hardcoded array index-based patching with a search-and-replace approach for updating ADOT instrumentation images in our EKS test deployments. The solution uses `jq` to find the correct argument by pattern matching. ### Problem The current implementation uses hardcoded array indices to patch deployment arguments: - Java: `args[2]` - Python: `args[3]` - DotNet: `args[4]` - NodeJS: `args[5]` This approach is fragile and will break if: - Arguments are reordered in the deployment - New arguments are added before the image arguments - The deployment structure changes Which is what has happened [here](https://github.com/aws-observability/aws-otel-java-instrumentation/actions/runs/15219049376/job/42813192472#step:26:316) (notice new args were added in the deployment config). ### Before ```bash kubectl patch deploy ... --type='json' \ -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args/2", "value": "--auto-instrumentation-java-image=..."}]' ``` ### After ```bash kubectl get deploy ... -o json | \ jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-java-image=") then "--auto-instrumentation-java-image=..." else . end)' | \ kubectl apply -f - ``` ## Test strategy ### 1. Functional Testing with Real EKS Deployment Retrieved actual deployment configuration from e2e-playground cluster and verified both approaches produce identical results: ```bash # Both approaches successfully update the image: OLD: --auto-instrumentation-java-image=TEST_JAVA:v1.0.0 ✓ NEW: --auto-instrumentation-java-image=TEST_JAVA:v1.0.0 ✓ # NEW approach only modifies the targeted argument: --auto-instrumentation-java-image=TEST_JAVA:v1.0.0 ✓ Changed --auto-instrumentation-python-image=...v0.9.0 ✓ Unchanged --auto-instrumentation-dotnet-image=...v1.7.0 ✓ Unchanged --auto-instrumentation-nodejs-image=...v0.6.0 ✓ Unchanged ``` ### 2. Edge Case Testing Results Tested five critical edge cases with the actual deployment configuration: #### Non-existent argument patch - **Test**: Try to patch `--auto-instrumentation-go-image` (doesn't exist) - **OLD approach**: Would fail with index out of bounds - **NEW approach**: Safe no-op, no changes made #### Reordered arguments - **Test**: Swapped Java and Python argument positions - **OLD approach**: Created duplicate Java entries, corrupted deployment - **NEW approach**: Correctly found and updated only the Java argument #### New arguments inserted - **Test**: Added new flags before image arguments - **OLD approach**: Patched `--new-feature-flag=enabled` instead of Java image - **NEW approach**: Still correctly found and patched Java image #### Sequential patches - **Test**: Applied multiple patches in sequence (simulating CI/CD) - **Result**: Both Java and Python successfully updated without conflicts #### Malformed arguments - **Test**: Replaced Java arg with malformed string - **OLD approach**: Would blindly replace at index - **NEW approach**: No match found, safely skipped ### 3. Test Commands Used ```bash # Get real deployment kubectl get deploy -n amazon-cloudwatch amazon-cloudwatch-observability-controller-manager -o json > deployment.json # Test transformation cat deployment.json | \ jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-java-image=") then "--auto-instrumentation-java-image=NEW_IMAGE" else . end)' # Run comprehensive edge case tests ./test-edge-cases.sh ``` ### Test Files - [test-real-deployment.sh](https://paste.amazon.com/show/yiyuanh/1748359780) - Testing with actual deployment configuration - [test-edge-cases.sh](https://paste.amazon.com/show/yiyuanh/1748359814) - Comprehensive edge case testing on real deployment *Rollback procedure:* We can safely rollback these changes by reverting the commit. *Ensure you've run the following tests on your changes and include the link below:* To do so, create a `test.yml` file with `name: Test` and workflow description to test your changes, then remove the file for your PR. Link your test run in your PR description. This process is a short term solution while we work on creating a staging environment for testing. NOTE: TESTS RUNNING ON A SINGLE EKS CLUSTER CANNOT BE RUN IN PARALLEL. See the [needs](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idneeds) keyword to run tests in succession. - Run Java EKS on `e2e-playground` in us-east-1 and eu-central-2 - Run Python EKS on `e2e-playground` in us-east-1 and eu-central-2 - Run metric limiter on EKS cluster `e2e-playground` in us-east-1 and eu-central-2 - Run EC2 tests in all regions - Run K8s on a separate K8s cluster (check IAD test account for master node endpoints; these will change as we create and destroy clusters for OS patching) By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
diff --git a/.github/workflows/actions/patch_image_and_check_diff/action.yml b/.github/workflows/actions/patch_image_and_check_diff/action.yml
@@ -84,8 +84,11 @@ runs:
       if: ${{ inputs.repository == 'aws-otel-python-instrumentation' }}
       shell: bash
       run: |
-        kubectl patch deploy -n amazon-cloudwatch amazon-cloudwatch-observability-controller-manager --type='json' \
-        -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args/3", "value": "--auto-instrumentation-python-image=${{ inputs.patch-image-arn }}"}]'
+        # Get current deployment and update the Python image argument
+        kubectl get deploy -n amazon-cloudwatch amazon-cloudwatch-observability-controller-manager -o json | \
+        jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-python-image=") then "--auto-instrumentation-python-image=${{ inputs.patch-image-arn }}" else . end)' | \
+        kubectl apply -f -
+        
         kubectl delete pods --all -n amazon-cloudwatch
         sleep 10
         kubectl wait --for=condition=Ready pod --all -n amazon-cloudwatch
@@ -98,8 +101,11 @@ runs:
       if: ${{ inputs.repository == 'aws-otel-java-instrumentation' }}
       shell: bash
       run: |
-        kubectl patch deploy -n amazon-cloudwatch amazon-cloudwatch-observability-controller-manager --type='json' \
-        -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args/2", "value": "--auto-instrumentation-java-image=${{ inputs.patch-image-arn }}"}]'
+        # Get current deployment and update the Java image argument
+        kubectl get deploy -n amazon-cloudwatch amazon-cloudwatch-observability-controller-manager -o json | \
+        jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-java-image=") then "--auto-instrumentation-java-image=${{ inputs.patch-image-arn }}" else . end)' | \
+        kubectl apply -f -
+        
         kubectl delete pods --all -n amazon-cloudwatch
         sleep 10
         kubectl wait --for=condition=Ready pod --all -n amazon-cloudwatch
@@ -112,8 +118,11 @@ runs:
       if: ${{ inputs.repository == 'aws-otel-dotnet-instrumentation' }}
       shell: bash
       run: |
-        kubectl patch deploy -namazon-cloudwatch amazon-cloudwatch-observability-controller-manager --type='json' \
-        -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args/4", "value": "--auto-instrumentation-dotnet-image=${{ inputs.patch-image-arn }}"}]'
+        # Get current deployment and update the DotNet image argument
+        kubectl get deploy -n amazon-cloudwatch amazon-cloudwatch-observability-controller-manager -o json | \
+        jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-dotnet-image=") then "--auto-instrumentation-dotnet-image=${{ inputs.patch-image-arn }}" else . end)' | \
+        kubectl apply -f -
+        
         kubectl delete pods --all -n amazon-cloudwatch
         sleep 10
         kubectl wait --for=condition=Ready pod --all -n amazon-cloudwatch
@@ -126,8 +135,11 @@ runs:
       if: ${{ inputs.repository == 'aws-otel-js-instrumentation' }}
       shell: bash
       run: |
-        kubectl patch deploy -namazon-cloudwatch amazon-cloudwatch-observability-controller-manager --type='json' \
-        -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args/5", "value": "--auto-instrumentation-nodejs-image=${{ inputs.patch-image-arn }}"}]'
+        # Get current deployment and update the Node.js image argument
+        kubectl get deploy -n amazon-cloudwatch amazon-cloudwatch-observability-controller-manager -o json | \
+        jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-nodejs-image=") then "--auto-instrumentation-nodejs-image=${{ inputs.patch-image-arn }}" else . end)' | \
+        kubectl apply -f -
+        
         kubectl delete pods --all -n amazon-cloudwatch
         sleep 10
         kubectl wait --for=condition=Ready pod --all -n amazon-cloudwatch