Skip to content

Conversation

@sdickhoven
Copy link
Contributor

@sdickhoven sdickhoven commented Feb 15, 2025

this pr fixes a race condition that occurs when new cni config files are created while the install-cni.sh script is executing anywhere here:

https://github.com/linkerd/linkerd2-proxy-init/blob/cni-plugin/v1.6.0/cni-plugin/deployment/scripts/install-cni.sh#L323-L348

we have observed this race condition several times in our eks clusters over the past few days where we are chaining cilium to the aws vpc cni.

the install-cni.sh script simply fails to patch the cilium cni config sometimes. i.e. cilium and linkerd-cni must be starting up and manipulating /etc/cni/net.d at just about the same time.

i have a temporary workaround for this race condition in the form of the following kustomize patch:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: linkerd-cni
resources:
- manifest.yaml
patches:
- patch: |
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: linkerd-cni
      namespace: linkerd-cni
    spec:
      template:
        spec:
          containers:
          - name: install-cni
            lifecycle:
              postStart:
                exec:
                  command:
                  - /bin/sh
                  - -c
                  - |
                    sleep 10 # wait for `install-cni.sh` to enter into `inotifywait` loop
                    exec &> /tmp/cni_config_repair.out
                    set -x
                    HOST_CNI_NET="${CONTAINER_MOUNT_PREFIX:-/host}${DEST_CNI_NET_DIR:-/etc/cni/net.d}"
                    find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) -print0 |
                    while read -r -d $'\0' file
                    do
                      sleep 2 # try to avoid interfering with non-atomic filesystem operations
                      grep -qw linkerd "${file}" && continue || echo "Found unpatched CNI config file: ${file}"
                      tmp_file="$(mktemp -ut linkerd-cni.XXXXXX)"
                      cp -fp "${file}" "${tmp_file}"
                      echo >> "${tmp_file}" # force hash diff
                      mv -f "${tmp_file}" "${file}" # IMPORTANT: use atomic filesystem operation!
                    done

this pr furthermore addresses several additional race conditions that can occur when two or more cni config file are created in short succession and/or when a cni config file is created and then removed immediately afterwards.

@sdickhoven sdickhoven requested a review from a team as a code owner February 15, 2025 03:24
@sdickhoven sdickhoven force-pushed the BUG-fix-race-condition branch from bf4f198 to a72d7b3 Compare February 15, 2025 04:56
@sdickhoven sdickhoven force-pushed the BUG-fix-race-condition branch 2 times, most recently from 4764f46 to 64fef36 Compare February 19, 2025 16:40
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdickhoven for this. This certainly makes the logic more robust an less repetitive 👍
Can I ask you to please create a separate PR for the changes unrelated to the fix? We always squash commits when merging, so it'd be nice to have those as a separate commit.
You can disregard for now the CI failure; we're currently addressing that in other PRs.

@sdickhoven sdickhoven force-pushed the BUG-fix-race-condition branch from 9c26082 to 438915e Compare February 20, 2025 15:14
@sdickhoven
Copy link
Contributor Author

hi @alpeb 👋

i have taken out the unrelated changes. ✅

happy to submit another pr for those unrelated changes but they are 100% cosmetic and 0% functional. so there's really no need.

@sdickhoven
Copy link
Contributor Author

sdickhoven commented Feb 21, 2025

by the way, i just thought of a way in which install-cni.sh can go into an infinite patching loop.

specifically, the current sha256sum logic assumes that there is only ever going to be one cni config file.

picture this scenario: two new cni config files are created at very nearly the same time.

let's call them

  • cni1.conflist
  • cni2.conflist

sync() updates cni1.conflist and sets cni_conf_sha to the sha sum for that file.

moving the updated cni1.conflist file into place triggers another inotifywait event.

but before we can deal with that inotifywait event, we first have to deal with the one that is already waiting in the queue for cni2.conflist...

sync() updates cni2.conflist and sets cni_conf_sha to the sha sum for that file.

moving the updated cni2.conflist file into place triggers another inotifywait event.

but before we can deal with that inotifywait event, we first have to deal with the one that is already waiting in the queue for cni1.conflist...

...and since cni_conf_sha now contains the sha sum of cni2.conflist, sync() will not detect that it has already updated cni1.conflist...

so the cycle repeats... forever.

should i fix this behavior in this pr or create a new pr?

it is technically a different error case (not a race condition but an infinite loop).

i would address this by updating the sha sum logic to store all sha sums in a variable so that the sha sum for the correct file can be looked up.

if you want me to submit a separate pr, i'm going to wait until this pr is merged... because the required changes would be different for pre-/post-merge install-cni.sh scripts.

@sdickhoven
Copy link
Contributor Author

sdickhoven commented Feb 22, 2025

this is what my fix for the infinite patching loop would look like (using this pr as base):

diff --git a/cni-plugin/deployment/scripts/install-cni.sh b/cni-plugin/deployment/scripts/install-cni.sh
index cdf166c..b6156d7 100755
--- a/cni-plugin/deployment/scripts/install-cni.sh
+++ b/cni-plugin/deployment/scripts/install-cni.sh
@@ -264,14 +264,16 @@ sync() {
 
 # monitor_cni_config starts a watch on the host's CNI config directory
 monitor_cni_config() {
+  local new_sha
   inotifywait -m "${HOST_CNI_NET}" -e create,moved_to,modify |
     while read -r directory action filename; do
       if [[ "$filename" =~ .*.(conflist|conf)$ ]]; then 
         log "Detected change in $directory: $action $filename"
-        sync "$filename" "$action" "$cni_conf_sha"
+        sync "$filename" "$action" "$(jq -r --arg file "$filename" '.[$file] | select(.)' <<< "$cni_conf_sha")"
         # calculate file SHA to use in the next iteration
         if [[ -e "$directory/$filename" ]]; then
-          cni_conf_sha="$(sha256sum "$directory/$filename" | while read -r s _; do echo "$s"; done)"
+          new_sha="$(sha256sum "$directory/$filename" | while read -r s _; do echo "$s"; done)"
+          cni_conf_sha="$(jq -c --arg file "$filename" --arg sha "$new_sha" '. * {$file: $sha}' <<< "$cni_conf_sha")"
         fi
       fi
     done
@@ -315,7 +317,7 @@ install_cni_bin
 # Otherwise, new CNI config files can be created just _after_ the initial round
 # of patching and just _before_ we set up the `inotifywait` loop to detect new
 # CNI config files.
-cni_conf_sha="__init__"
+cni_conf_sha='{}'
 monitor_cni_config &
 
 # Append our config to any existing config file (*.conflist or *.conf)

this will maintain cni config sha hashes in a json object like this:

{
  "05-cilium.conflist" : "0a08ee0b9360e2ee2c3ed1d83263a3168832101346d0528a2474c3f80b7c73d6",
  "10-aws.conflist"    : "7ed380c9100362003cde9861cc6ef09307245eba3ea963cdba36186a30284acd"
}

the select(.) rejects null as output.

since the string null is not a valid sha sum, it's not really necessary to do this but seems like the right thing to do. 🤷

$ filename=abc
$ jq -r --arg file "$filename" '.[$file]' <<< '{}'
null
$ jq -r --arg file "$filename" '.[$file] | select(.)' <<< '{}'
$ jq -r --arg file "$filename" '.[$file] | select(.)' <<< '{"abc":"def"}'
def

also <<< only works with bash, not sh. if you prefer we can change

... <<< "$cni_conf_sha"

to

... < <(echo "$cni_conf_sha")

or

echo "$cni_conf_sha" | ...

@alpeb
Copy link
Member

alpeb commented Mar 5, 2025

Sorry for the late reply... As you correctly realized, this script was designed supposing the existence of a single cni config file. I'd like to spend more time investigating how linkerd-cni should behave in the presence of multiple such files before evaluating your solution. So let's for now leave your last diff posted as a comment out of the current PR. I'll give another look and round of tests to the current commits in the following days, so we can move forward.

@alpeb
Copy link
Member

alpeb commented Mar 7, 2025

Testing this results in the following output:

$ k -n linkerd-cni logs -f linkerd-cni-sq6cf
[2025-03-07 15:29:23] Wrote linkerd CNI binaries to /host/opt/cni/bin
Setting up watches.
Watches established.
[2025-03-07 15:29:23] Trigger CNI config detection for /host/etc/cni/net.d/10-calico.conflist
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: CREATE 10-calico.conflist
Setting up watches.
Watches established.
[2025-03-07 15:29:23] New/changed file [10-calico.conflist] detected; re-installing
[2025-03-07 15:29:23] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-03-07 15:29:23] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "debug",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190"],
    "simulate": false,
    "use-wait-flag": true,
    "iptables-mode": "legacy",
    "ipv6": false
  }
}
[2025-03-07 15:29:23] Created CNI config /host/etc/cni/net.d/10-calico.conflist
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: MODIFY 10-calico.conflist
[2025-03-07 15:29:23] Ignoring event: MODIFY /host/etc/cni/net.d/10-calico.conflist; no real changes detected
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: CREATE 10-calico.conflist
[2025-03-07 15:29:23] Ignoring event: CREATE /host/etc/cni/net.d/10-calico.conflist; no real changes detected
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: MODIFY 10-calico.conflist
[2025-03-07 15:29:23] Ignoring event: MODIFY /host/etc/cni/net.d/10-calico.conflist; no real changes detected

Those extra events at the bottom are new with this change. Given that we're now calling monitor_cni_config & earlier, the entire block that follows it should no longer be necessary, right?

@sdickhoven
Copy link
Contributor Author

sdickhoven commented Mar 7, 2025

Given that we're now calling monitor_cni_config & earlier, the entire block that follows it should no longer be necessary, right?

which block are you referring to?

this?:

# Append our config to any existing config file (*.conflist or *.conf)
config_files=$(find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \))
if [ -z "$config_files" ]; then
    log "No active CNI configuration files found"
else
  config_file_count=$(echo "$config_files" | grep -v linkerd | sort | wc -l)
  if [ "$config_file_count" -eq 0 ]; then
    log "No active CNI configuration files found"
  else
    find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) -print0 |
      while read -r -d $'\0' file; do
        log "Installing CNI configuration for $file"
...

this block is definitely needed!

because cni config files can already exist before install-cni.sh even starts up. and you would never get an event from inotifywait for those files that already exist in /etc/cni/net.d.

the logic is:

  1. set up inotifywait
  2. force any already existing files to trigger inotifywait

i guess that the change in the block that follows monitor_cni_config & is not strictly necessary (because all that really matters is that the script dosn't miss new cni config files).

but i figured that consolidating the patching logic made sense.

does that answer your question? apologies if i did not understand your question correctly.

the log output looks exactly like what i would expect from my changes.

@sdickhoven
Copy link
Contributor Author

sdickhoven commented Mar 7, 2025

oh wait... i think i understand what you're asking...

Those extra events at the bottom are new with this change.

you are right. i would not expect those last two events. however, i thought that this was maybe your ci check making sure that trying to update the same file twice would not lead to double-patching. 🤷

if that's not the case then you are correct: those last two events shouldn't happen and i'm not sure where they would be coming from. 😕

...but only the last two. the two events before the last two are expected and a result of this patch. yes.

before this change, monitor_cni_config was not already running when install-cni.sh started patching existing cni config files.

so there wouldn't have been an inotifywait event associated with that activity.

now, when install-cni.sh patches an existing cni config file, inotifywait is already watching for new/modified files.

so the act of patching will trigger inotifywait. that part is new.

so:

1st inotifywatch event comes from:

log "Trigger CNI config detection for $file"
...
mv -f "$tmp_file" "$file"

2nd inotifywait event comes from install_cni_conf actually patching that file and moving the patched file into place:

https://github.com/linkerd/linkerd2-proxy-init/blob/proxy-init/v2.4.2/cni-plugin/deployment/scripts/install-cni.sh#L219

both of these events did not show up before because monitor_cni_config was not running yet (and was not used to accomplish the actual patching).

but, as you can see, the second event is ignored (as expected/intended).

but then there are two more CREATE/MODIFY events at the very end that i can't explain... again, i thought this was a ci check making sure that the shasum logic is doing the right thing. 🤷

@sdickhoven
Copy link
Contributor Author

i hope this answers your question. please let me know if i'm not making sense.

@sdickhoven
Copy link
Contributor Author

any progress on this?

cni chaining is kind of a thing and it would be good for linkerd to work correctly in environments where cni chaining is already in place... i.e. where linkerd isn't the only chained cni plugin.

chaining cilium to aws vpc cni is definitely a pretty common setup based on what i see in the cilium slack workspace.

let me know if there's anything else i can do to help get this very real race condition fixed.

@alpeb
Copy link
Member

alpeb commented May 26, 2025

Hi, thanks for pinging back; looking again into this...
You're right about requiring the logic to be triggered for the initial file. Quick question: is there a more atomic way to trigger that than having the copy followed by the move? Would just calling touch on the file trigger the sync? (that'll likely require adding another event to the list of inotifywait events).

As a side note, I've just realized libcni recently released a way of doing safe subdirectory-based plugin config loading, which would save us from all this config patching nonsense... but that approach will have to wait till it gets picked up by the major cloud providers. Perhaps we could have users opt into that behavior. Not asking you to implement any of that, just wanted to share what I found out 🙂

@sdickhoven
Copy link
Contributor Author

sdickhoven commented May 27, 2025

is there a more atomic way to trigger that than having the copy followed by the move?

yes. good point. touching a file would be cleaner indeed. something like this should work:

root@5a06a8dd2c88:~# inotifywait -m . -e moved_to,close_write &
[1] 211
Setting up watches.
Watches established.

root@5a06a8dd2c88:~# touch ttt
./ CLOSE_WRITE,CLOSE ttt
root@5a06a8dd2c88:~# echo > abc
./ CLOSE_WRITE,CLOSE abc
root@5a06a8dd2c88:~# cat ttt
root@5a06a8dd2c88:~# echo > def
./ CLOSE_WRITE,CLOSE def
root@5a06a8dd2c88:~# touch def
./ CLOSE_WRITE,CLOSE def
root@5a06a8dd2c88:~# touch /tmp/ttt
root@5a06a8dd2c88:~# mv /tmp/ttt .
./ MOVED_TO ttt
root@5a06a8dd2c88:~# rm ttt
root@5a06a8dd2c88:~#

I've just realized libcni recently released a way of doing safe subdirectory-based plugin config loading, which would save us from all this config patching nonsense

😍 that's great! i hope it becomes available soon. that would certainly make things easier.

do you want me to update my pr with the touch logic (which requires an update the inotifywait events we want to watch for)?

also, do you want me to open another pr for the other race condition i pointed out about the infinite patching loop due to the flawed shasum logic?

if so, then i'd want to wait until this pr is merged before i submit the second pr.

@alpeb
Copy link
Member

alpeb commented May 27, 2025

Yes, please update the PR with the touch logic. As for the looping issue, I think we're gonna hold on further logical changes into this script, as doing concurrency handling in bash is getting a little too complex. We'll be considering rewriting this in the future in a proper language, or perhaps start relying on the new libcni feature I mentioned, after getting a clearer picture about its availability.

@sdickhoven
Copy link
Contributor Author

sdickhoven commented May 27, 2025

As for the looping issue, I think we're gonna hold on further logical changes into this script, as doing concurrency handling in bash is getting a little too complex.

i personally think that this is a mistake and here's why:

handling the concurrency (whether in bash or in another language) is actually not very complex.

the inotifywait serializes all the filesystem events which makes them very easy to deal with... i.e. the bash code in question executes entirely sequentially.

the only thing you have to do is make sure that your logic can deal with filesystem events for multiple files in any order.

and i can trivially provoke an infinite patching loop (given the current shasum logic) by running the following two commands in quick succession:

echo '{"cniVersion": "0.4.0", "name": "aws-cni", "disableCheck": true, "plugins": [{"name": "foo"}]}' > /etc/cni/net.d/01-foo.conflist
echo '{"cniVersion": "0.4.0", "name": "aws-cni", "disableCheck": true, "plugins": [{"name": "bar"}]}' > /etc/cni/net.d/02-bar.conflist

...which i just did on one of my worker nodes and it sent linkerd-cni into said infinite patching loop:

[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: CREATE 01-foo.conflist
[2025-05-27 19:32:18] New/changed file [01-foo.conflist] detected; re-installing
[2025-05-27 19:32:18] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-05-27 19:32:18] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "info",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190","25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "outbound-ports-to-ignore": ["25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "simulate": false,
    "use-wait-flag": false,
    "iptables-mode": "plain",
    "ipv6": false
  }
}
[2025-05-27 19:32:18] Created CNI config /host/etc/cni/net.d/01-foo.conflist
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: MODIFY 01-foo.conflist
[2025-05-27 19:32:18] Ignoring event: MODIFY /host/etc/cni/net.d/01-foo.conflist; no real changes detected
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: CREATE 02-bar.conflist
[2025-05-27 19:32:18] New/changed file [02-bar.conflist] detected; re-installing
[2025-05-27 19:32:18] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-05-27 19:32:18] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "info",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190","25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "outbound-ports-to-ignore": ["25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "simulate": false,
    "use-wait-flag": false,
    "iptables-mode": "plain",
    "ipv6": false
  }
}
[2025-05-27 19:32:18] Created CNI config /host/etc/cni/net.d/02-bar.conflist
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: MODIFY 02-bar.conflist
[2025-05-27 19:32:18] Ignoring event: MODIFY /host/etc/cni/net.d/02-bar.conflist; no real changes detected
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: CREATE 01-foo.conflist
[2025-05-27 19:32:18] New/changed file [01-foo.conflist] detected; re-installing
[2025-05-27 19:32:19] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-05-27 19:32:19] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "info",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190","25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "outbound-ports-to-ignore": ["25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "simulate": false,
    "use-wait-flag": false,
    "iptables-mode": "plain",
    "ipv6": false
  }
}
[2025-05-27 19:32:19] Created CNI config /host/etc/cni/net.d/01-foo.conflist
[2025-05-27 19:32:19] Detected change in /host/etc/cni/net.d/: MODIFY 01-foo.conflist
[2025-05-27 19:32:19] Ignoring event: MODIFY /host/etc/cni/net.d/01-foo.conflist; no real changes detected
[2025-05-27 19:32:19] Detected change in /host/etc/cni/net.d/: CREATE 02-bar.conflist
...

...and it's perfectly plausible that two cni plugin files are created in short succession.

the race condition that is fixed by the current version of my pr may be "masking" the infinite patch loop race condition to a large extent.

so fixing this race condition will probably make the other race condition more likely to occur.

anyway, my patch to address the infinite patch loop is quite trivial imo. and i would personally include it in this pr.

but... i don't want to talk you into doing something that you're not comfortable with. so... your call.

log "Trigger CNI config detection for $file"
# The following will trigger the `sync()` function via `inotifywait` in
# `monitor_cni_config()`.
touch "$file"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: there may be filesystems out there that round the mtime attribute to seconds.

see kubernetes/minikube#1594

this may render a touch ineffective (because the old and new mtime could end up being the same which may then not trigger an ATTRIB event) but i have no way of knowing/testing that.

the safest / most universally compatible way to do this is probably what i had before. but this is definitely "cleaner".

up to you if you want to keep the touch logic or revert back to the mv logic. happy to revert if you tell me to.

@sdickhoven
Copy link
Contributor Author

sdickhoven commented May 27, 2025

but... i don't want to talk you into doing something that you're not comfortable with. so... your call.

also... no rush. ☺️

we currently have the workaround outlined in the pr description deployed in all of our clusters. so we are fine. ✅

if you'd rather rewrite the cni patch logic in golang or wait for the new subdirectory-based plugin config, that's fine by us.

but other linkerd-cni users may run into this race condition and won't be able to effectively troubleshoot / mitigate it. 🤷

@alpeb
Copy link
Member

alpeb commented May 28, 2025

Thanks for the detailed repro for the infinite loop issue. I wasn't entirely clear on how multiple config files would coexist under /etc/cni/net.d (leaving aside the new libcni drop-in config stuff, as if IIUC the separate config files would be under separate directories, so it shouldn't affect our existing inotifywait setup). So here's a summary of what I tested out, just for reference.

I set up a test EKS cluster with aws-cni and cilium-cni. When installing cilium, I see it copies aws-cni's conflist file and appends its own config into its plugins array, naming the new file with a higher precedence so it overrides the original conflist:

root@ip-172-31-61-213:/host/etc/cni/net.d# ll
total 8
drwx------. 2 root root   55 May 28 14:44 ./
drwxr-xr-x. 3 root root   19 May 19 17:13 ../
-rw-r--r--. 1 root root 1034 May 28 14:44 05-cilium.conflist
-rw-r--r--. 1 root root  906 May 28 14:37 10-aws.conflist

The original conflist (10-aws.conflist) should no longer be modified, so I guess this is why we haven't encountered the condition you describe (btw this is probably a better approach than mutating files, but that's another discussion).

That being said, we have no guarantees about what touches those files and I think you made a good case about your infinite loop remediation which should provide improved guardrails.

As for the mtime resolution, that's a good find too 👍 We don't wanna open a hole for another edge case. Having to interact with the file system like that though still feels a little hacky to me. What do you think about calling a function instead? Such function would get reused inside monitor_cni_config(), whose only concern would end up being setting up inotifywait.

@sdickhoven
Copy link
Contributor Author

sdickhoven commented May 28, 2025

btw this is probably a better approach than mutating files

it's not bad but it comes with its own set of challenges. 🙁

but, yeah, instead of using shasum to keep track of which files have already been patched, we could just check if a newly created file in /etc/cni/net.d is the alphabetically "lowest" one in that folder (based on whatever criteria the kubernetes cni uses); and if not, simply ignore it.

Having to interact with the file system like that though still feels a little hacky to me. What do you think about calling a function instead? Such function would get reused inside monitor_cni_config(),

yes. we basically already have that: sync(). but monitor_cni_config() does implement some logic that should probably just go into sync() and then we could just call sync() from inside the find ... | while read file; do ... loop instead of triggering the inotifywait in monitor_cni_config() with a mv or a touch.

there are several things about this script that are a bit kludgy and inconsistent that i would be happy to clean up but i didn't want to make such a big change that you would be reluctant to accept my pr.

i.e. i wanted to make as small a change as possible that basically preserves most of the existing logic to make it easier for you to reason about my change.

but i've been writing shell scripts since the early 90s so i feel more than comfortable refactoring (at least some aspects of) this script if you are ok with that.

... whose only concern would end up being setting up inotifywait.

that wouldn't work without some exceptionally ugly kludges (because bash isn't a "real" programming language)... so, no to that.

...or maybe you meant to say "trigger" (instead of "setting up") the inotifywait 🤷

i.e. have a separate function that does

log "Trigger CNI config detection for $file"
tmp_file="$(mktemp -u /tmp/linkerd-cni.patch-candidate.XXXXXX)"
cp -fp "$file" "$tmp_file"
# The following will trigger the `sync()` function via `inotifywait` in
# `monitor_cni_config()`.
mv "$tmp_file" "$file"

@sdickhoven
Copy link
Contributor Author

sdickhoven commented May 29, 2025

so... i found a number of additional race conditions in this script.

none of them have to do with bash but with the fact that we are dealing with a filesystem. so you have to really think about atomicity... which we would have to do in e.g. golang as well.

anyway... below is a diff based on the current pr of all the changes that i would like to submit in order to remove all (except for one very unlikely) race conditions (that i have discovered while refactoring)... plus a few formatting nits:

diff --git a/cni-plugin/deployment/scripts/install-cni.sh b/cni-plugin/deployment/scripts/install-cni.sh
index fe2ffa0..f0ba4f5 100755
--- a/cni-plugin/deployment/scripts/install-cni.sh
+++ b/cni-plugin/deployment/scripts/install-cni.sh
@@ -25,7 +25,7 @@
 # - Expects the desired CNI config in the CNI_NETWORK_CONFIG env variable.
 
 # Ensure all variables are defined, and that the script fails when an error is hit.
-set -u -e -o pipefail
+set -u -e -o pipefail +o noclobber
 
 # Helper function for raising errors
 # Usage:
@@ -77,7 +77,7 @@ cleanup() {
   # Find all conflist files and print them out using a NULL separator instead of
   # writing each file in a new line. We will subsequently read each string and
   # attempt to rm linkerd config from it using jq helper.
-  local cni_data=''
+  local cni_data
   find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' \) -print0 |
     while read -r -d $'\0' file; do
       log "Removing linkerd-cni config from $file"
@@ -176,104 +176,113 @@ create_cni_conf() {
   CNI_NETWORK_CONFIG="${CNI_NETWORK_CONFIG:-}"
 
   # If the CNI Network Config has been overwritten, then use template from file
-  if [ -e "${CNI_NETWORK_CONFIG_FILE}" ]; then
-    log "Using CNI config template from ${CNI_NETWORK_CONFIG_FILE}."
-    cp "${CNI_NETWORK_CONFIG_FILE}" "${TMP_CONF}"
-  elif [ "${CNI_NETWORK_CONFIG}" ]; then
+  if [ -e "$CNI_NETWORK_CONFIG_FILE" ]; then
+    log "Using CNI config template from $CNI_NETWORK_CONFIG_FILE."
+    cp -fp "$CNI_NETWORK_CONFIG_FILE" "$TMP_CONF"
+  elif [ "$CNI_NETWORK_CONFIG" ]; then
     log 'Using CNI config template from CNI_NETWORK_CONFIG environment variable.'
-    cat >"${TMP_CONF}" <<EOF
-${CNI_NETWORK_CONFIG}
+    cat <<EOF > "$TMP_CONF"
+$CNI_NETWORK_CONFIG
 EOF
   fi
 
   # Use alternative command character "~", since these include a "/".
-  sed -i s~__KUBECONFIG_FILEPATH__~"${DEST_CNI_NET_DIR}/${KUBECONFIG_FILE_NAME}"~g ${TMP_CONF}
+  sed -i s~__KUBECONFIG_FILEPATH__~"$DEST_CNI_NET_DIR/$KUBECONFIG_FILE_NAME"~g "$TMP_CONF"
 
-  log "CNI config: $(cat ${TMP_CONF})"
+  log "CNI config: $(cat "$TMP_CONF")"
 }
 
 install_cni_conf() {
   local cni_conf_path=$1
 
-  local tmp_data=''
-  local conf_data=''
-  if [ -e "${cni_conf_path}" ]; then
-   # Add the linkerd-cni plugin to the existing list
-   tmp_data=$(cat "${TMP_CONF}")
-   conf_data=$(jq --argjson CNI_TMP_CONF_DATA "${tmp_data}" -f /linkerd/filter.jq "${cni_conf_path}")
-   echo "${conf_data}" > ${TMP_CONF}
-  fi
+  local tmp_data=$(cat "$TMP_CONF")
+  local conf_data=$(jq --argjson CNI_TMP_CONF_DATA "$tmp_data" -f /linkerd/filter.jq "$cni_conf_path" || true)
+
+  # Ensure that CNI config file did not disappear during processing.
+  [ -n "$conf_data" ] || return 0
 
-  # If the old config filename ends with .conf, rename it to .conflist, because it has changed to be a list
-  filename=${cni_conf_path##*/}
-  extension=${filename##*.}
+  # Add the linkerd-cni plugin to the existing list.
+  echo "$conf_data" > "$TMP_CONF"
+
+  # If the old config filename ends with .conf, rename it to .conflist because
+  # it has changed to be a list.
+  local filename=${cni_conf_path##*/}
+  local extension=${filename##*.}
   # When this variable has a file, we must delete it later.
   old_file_path=
-  if [ "${filename}" != '01-linkerd-cni.conf' ] && [ "${extension}" = 'conf' ]; then
-   old_file_path=${cni_conf_path}
-   log "Renaming ${cni_conf_path} extension to .conflist"
-   cni_conf_path="${cni_conf_path}list"
+  if [ "$filename" != '01-linkerd-cni.conf' ] && [ "$extension" = 'conf' ]; then
+    old_file_path=$cni_conf_path
+    log "Renaming $cni_conf_path extension to .conflist"
+    cni_conf_path="${cni_conf_path}list"
   fi
 
+  # Store SHA of each patched file in global `CNI_CONF_SHA` variable.
+  #
+  # This must happen in a non-concurrent access context!
+  #
+  # The below logic assumes that the `CNI_CONF_SHA` variable is already a
+  # valid JSON object. So this variable must be initialized with '{}'!
+  #
+  # E.g. (pretty-printed; actual variable stores compact JSON object)
+  #
+  # {
+  #   "/etc/cni/net.d/05-foo.conflist": "b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c",
+  #   "/etc/cni/net.d/10-bar.conflist": "7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730"
+  # }
+  local new_sha=$( (sha256sum "$TMP_CONF" || true) | awk '{print $1}' )
+  CNI_CONF_SHA=$(jq -c --arg f "$cni_conf_path" --arg sha "$new_sha" '. * {$f: $sha}' <<< "$CNI_CONF_SHA")
+
   # Move the temporary CNI config into place.
-  mv "${TMP_CONF}" "${cni_conf_path}" || exit_with_error 'Failed to mv files.'
-  [ -n "$old_file_path" ] && rm -f "${old_file_path}" && log "Removing unwanted .conf file"
+  mv "$TMP_CONF" "$cni_conf_path" || exit_with_error 'Failed to mv files.'
+  [ -n "$old_file_path" ] && rm -f "$old_file_path" && log "Removing unwanted .conf file"
 
-  log "Created CNI config ${cni_conf_path}"
+  log "Created CNI config $cni_conf_path"
 }
 
-# Sync() is responsible for reacting to file system changes. It is used in
-# conjunction with inotify events; sync() is called with the name of the file
-# that has changed, the event type (which can be either 'CREATE', 'DELETE',
-# 'MOVED_TO' or 'MODIFY', and the previously observed SHA of the configuration
-# file.
+# `sync()` is responsible for reacting to file system changes. It is used in
+# conjunction with inotify events; `sync()` is called with the event type (which
+# can be either 'CREATE', 'DELETE', 'MOVED_TO', 'MODIFY' or 'DELETE') and the
+# name of the file that has changed 
 #
-# Based on the changed file and event type, sync() might re-install the CNI
+# Based on the changed file and event type, `sync()` might re-install the CNI
 # plugin's configuration file.
 sync() {
-  local filename=$1
-  local ev=$2
-  local filepath="${HOST_CNI_NET}/$filename"
-
-  local prev_sha=$3
-
-  local config_file_count
-  local new_sha
-  if [ "$ev" = 'CREATE' ] || [ "$ev" = 'MOVED_TO' ] || [ "$ev" = 'MODIFY' ] || [ "$ev" = 'ATTRIB' ]; then
-    # When the event type is 'CREATE', 'MOVED_TO' or 'MODIFY', we check the
-    # previously observed SHA (updated with each file watch) and compare it
-    # against the new file's SHA. If they differ, it means something has
-    # changed.
-    new_sha=$(sha256sum "${filepath}" | while read -r s _; do echo "$s"; done)
-    if [ "$new_sha" != "$prev_sha" ]; then
-      # Create but don't rm old one since we don't know if this will be configured
-      # to run as _the_ cni plugin.
-      log "New/changed file [$filename] detected; re-installing"
-      create_kubeconfig
-      create_cni_conf
-      install_cni_conf "$filepath"
-    else
-      # If the SHA hasn't changed or we get an unrecognised event, ignore it.
-      # When the SHA is the same, we can get into infinite loops whereby a file has
-      # been created and after re-install the watch keeps triggering CREATE events
-      # that never end.
-      log "Ignoring event: $ev $filepath; no real changes detected"
-    fi
+  local ev=$1
+  local file=${2//\/\//\/} # replace "//" with "/"
+
+  [[ "$file" =~ .*.(conflist|conf)$ ]] || return 0
+
+  log "Detected event: $ev $file"
+
+  # Retrieve previous SHA of detected file (if any) and compute current SHA.
+  local previous_sha=$(jq -r --arg f "$file" '.[$f] | select(.)' <<< "$CNI_CONF_SHA")
+  local current_sha=$( (sha256sum "$file" || true) | awk '{print $1}' )
+
+  # If the SHA hasn't changed or the detected file has disappeared, ignore it.
+  # When the SHA is the same, we can get into infinite loops whereby a file
+  # has been created and after re-install the watch keeps triggering MOVED_TO
+  # events that never end.
+  # If the `current_sha` variable is blank then the detected CNI config file has
+  # disappeared and no further action is required.
+  # There exists an unhandled (highly improbable) edge case where a CNI plugin
+  # creates a config file and then _immediately_ removes it again _while_ we are
+  # in the process of patching it. If this happens, we may create a patched CNI
+  # config file that should *not* exist.
+  if [ -n "$current_sha" ] && [ "$current_sha" != "$previous_sha" ]; then
+    log "New/changed file [$file] detected; re-installing"
+    create_kubeconfig
+    create_cni_conf
+    install_cni_conf "$file"
+  else
+    log "Ignoring event: $ev $file; no real changes detected or file disappeared"
   fi
 }
 
 # monitor_cni_config starts a watch on the host's CNI config directory
 monitor_cni_config() {
-  inotifywait -m "${HOST_CNI_NET}" -e create,moved_to,modify,attrib |
+  inotifywait -m "$HOST_CNI_NET" -e create,moved_to,modify |
     while read -r directory action filename; do
-      if [[ "$filename" =~ .*.(conflist|conf)$ ]]; then 
-        log "Detected change in $directory: $action $filename"
-        sync "$filename" "$action" "$cni_conf_sha"
-        # calculate file SHA to use in the next iteration
-        if [[ -e "$directory/$filename" ]]; then
-          cni_conf_sha="$(sha256sum "$directory/$filename" | while read -r s _; do echo "$s"; done)"
-        fi
-      fi
+      sync "$action" "$directory/$filename"
     done
 }
 
@@ -284,16 +293,16 @@ monitor_cni_config() {
 # only reacting to direct creation of a "token" file, or creation of
 # directories containing a "token" file.
 monitor_service_account_token() {
-    inotifywait -m "${SERVICEACCOUNT_PATH}" -e create |
-      while read -r directory _ filename; do
-        target=$(realpath "$directory/$filename")
-        if [[ (-f "$target" && "${target##*/}" == "token") || (-d "$target" && -e "$target/token") ]]; then
-          log "Detected creation of file in $directory: $filename; recreating kubeconfig file"
-          create_kubeconfig
-        else
-          log "Detected creation of file in $directory: $filename; ignoring"
-        fi
-      done
+  inotifywait -m "$SERVICEACCOUNT_PATH" -e create |
+    while read -r directory _ filename; do
+      target=$(realpath "$directory/$filename")
+      if [[ (-f "$target" && "${target##*/}" == "token") || (-d "$target" && -e "$target/token") ]]; then
+        log "Detected creation of file in $directory: $filename; recreating kubeconfig file"
+        create_kubeconfig
+      else
+        log "Detected creation of file in $directory: $filename; ignoring"
+      fi
+    done
 }
 
 log() {
@@ -306,35 +315,32 @@ log() {
 
 # Delete old "interface mode" file, possibly left over from previous versions
 # TODO(alpeb): remove this on stable-2.15
-rm -f "${DEFAULT_CNI_CONF_PATH}"
+rm -f "$DEFAULT_CNI_CONF_PATH"
 
 install_cni_bin
 
-# The CNI config monitor must be set up _before_ we start patching CNI config
-# files!
+# The CNI config monitor must be set up _before_ we start patching existing CNI
+# config files!
 # Otherwise, new CNI config files can be created just _after_ the initial round
 # of patching and just _before_ we set up the `inotifywait` loop to detect new
 # CNI config files.
-cni_conf_sha="__init__"
+CNI_CONF_SHA='{}'
 monitor_cni_config &
 
 # Append our config to any existing config file (*.conflist or *.conf)
-config_files=$(find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \))
+config_files=$(find "$HOST_CNI_NET" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) | grep -v linkerd || true)
 if [ -z "$config_files" ]; then
-    log "No active CNI configuration files found"
+  log "No active CNI configuration files found"
 else
-  config_file_count=$(echo "$config_files" | grep -v linkerd | sort | wc -l)
-  if [ "$config_file_count" -eq 0 ]; then
-    log "No active CNI configuration files found"
-  else
-    find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) -print0 |
-      while read -r -d $'\0' file; do
-        log "Trigger CNI config detection for $file"
-        # The following will trigger the `sync()` function via `inotifywait` in
-        # `monitor_cni_config()`.
-        touch "$file"
-      done
-  fi
+  find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) -print0 |
+    while read -r -d $'\0' file; do
+      log "Trigger CNI config detection for $file"
+      tmp_file="$(mktemp -u /tmp/linkerd-cni.patch-candidate.XXXXXX)"
+      cp -fp "$file" "$tmp_file"
+      # The following will trigger the `sync()` function via filesystem event.
+      # This requires `monitor_cni_config()` to be up and running!
+      mv "$tmp_file" "$file" || exit_with_error 'Failed to mv files.'
+    done
 fi
 
 # Watch in bg so we can receive interrupt signals through 'trap'. From 'man

anyway, as you can see, it's quite a comprehensive change.

[...] then we could just call sync() from inside the find ... | while read file; do ... loop instead of triggering the inotifywait in monitor_cni_config() with a mv or a touch.

...actually, ☝️ this would make the bash code execute concurrently and cause lots of problems with global variables. so we definitely shouldn't do this.

we want to serialize all CNI config file changes in this script. and that would NOT be the case if we put monitor_cni_config() into the background (with &) before the find ... | while read file; do ... loop and then call sync() directly from the loop.

@sdickhoven
Copy link
Contributor Author

sdickhoven commented May 29, 2025

i found a number of additional race conditions in this script.

e.g. this is a race condition.

if the cni config file is removed between

if [ -e "${cni_conf_path}" ]; then

and

conf_data=$(jq --argjson CNI_TMP_CONF_DATA "${tmp_data}" -f /linkerd/filter.jq "${cni_conf_path}")

the script will exit and linkerd-cni's ability to patch cni config files will be permanently disabled.

as you can see, i am doing the following instead which is robust against this edge case:

conf_data=$(jq --argjson CNI_TMP_CONF_DATA "$tmp_data" -f /linkerd/filter.jq "$cni_conf_path" || true)
[ -n "$conf_data" ] || return 0

dependabot bot and others added 19 commits June 11, 2025 08:58
…rd#490)

Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3.4.0 to 3.5.0.
- [Release notes](https://github.com/docker/setup-qemu-action/releases)
- [Commits](docker/setup-qemu-action@4574d27...5964de0)

---
updated-dependencies:
- dependency-name: docker/setup-qemu-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…rd#493)

Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3.5.0 to 3.6.0.
- [Release notes](https://github.com/docker/setup-qemu-action/releases)
- [Commits](docker/setup-qemu-action@5964de0...2910929)

---
updated-dependencies:
- dependency-name: docker/setup-qemu-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/cache](https://github.com/actions/cache) from 4.2.1 to 4.2.2.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](actions/cache@0c907a7...d4323d4)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Fixes linkerd/linkerd2#12573

In linkerd#440 we introduced an inotifywait call to detect changes in the
service account token, to be run in parallel to the existing inotifywait
tracking cni config file changes. The cleanup() function didn't account
for the additional inotifywait background process, so the call `kill -s
  KILL "$(pgrep inotifywait)"` errored out as it was expecting just one
  PID.

An improper cleanup leaves the linkerd-cni binary deployed in the node,
along with its config in the cni config file, so it continues to be
called even if the linkerd-cni pod is deleted. The problem is that as
soon as the pod is deleted, the service account token that the
linkerd-cni binary relies on is revoked, causing the linkerd-cni
invocation triggered upon each pod creation in the node to fail with an
Unauthorized error. When rolling out the linkerd-cni pod this will
happen sometimes, if k8s revokes that token fast enough, blocking any
further creation of pods in that node.
cargo-deny-action is broken: EmbarkStudios/cargo-deny-action#91

This change replaces the action with a manual invocation via the dev
container image.
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.35.1 to 1.38.2.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](tokio-rs/tokio@tokio-1.35.1...tokio-1.38.2)

---
updated-dependencies:
- dependency-name: tokio
  dependency-version: 1.38.2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…erd#518)

Fixes linkerd/linkerd2#13976

When the Linkerd CNI-plugin acts on a pod with a `config.linkerd.io/skip-inbound-ports` annotation, it replaces the inbound skip ports that the CNI-plugin was configured with instead of adding to them.  The CNI-plugin is configured with the Linkerd proxy's admin and tap ports as skip inbound ports.  Therefore, on any pod with the `config.linkerd.io/skip-inbound-ports` annotation, the admin and tap ports will no longer be skipped by the iptables redirect rules, making these proxy ports inaccessible and causing tap and promethus scraping to no longer function.

Instead of replacing, we append the ports specified in `config.linkerd.io/skip-inbound-ports` to the inbound skip ports.  This allows the admin and tap ports to continue to skip iptables redirection and reach the proxy as desired.

This matches the behavior of the linkerd-init container: https://github.com/linkerd/linkerd2/blob/main/charts/partials/templates/_proxy-init.tpl#L25

Tested manually on k3d with Calico and validating that tap works as expected on workloads with `config.linkerd.io/skip-inbound-ports` configured.  Due to the architecture of the cni-plugin executable, it is not well set up to be tested automatically without further refactoring so we omit automated tests here.

Signed-off-by: Alex Leong <[email protected]>
…erd#517)

Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4.1.9 to 4.3.0.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@cc20338...d3f86a1)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: 4.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…erd#510)

Bumps [sigstore/cosign-installer](https://github.com/sigstore/cosign-installer) from 3.8.1 to 3.8.2.
- [Release notes](https://github.com/sigstore/cosign-installer/releases)
- [Commits](sigstore/cosign-installer@d7d6bc7...3454372)

---
updated-dependencies:
- dependency-name: sigstore/cosign-installer
  dependency-version: 3.8.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [ring](https://github.com/briansmith/ring) from 0.17.9 to 0.17.14.
- [Changelog](https://github.com/briansmith/ring/blob/main/RELEASES.md)
- [Commits](https://github.com/briansmith/ring/commits)

---
updated-dependencies:
- dependency-name: ring
  dependency-version: 0.17.14
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.33.0 to 0.38.0.
- [Commits](golang/net@v0.33.0...v0.38.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-version: 0.38.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…nkerd#509)

Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 2.2.1 to 2.2.2.
- [Release notes](https://github.com/softprops/action-gh-release/releases)
- [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md)
- [Commits](softprops/action-gh-release@c95fe14...da05d55)

---
updated-dependencies:
- dependency-name: softprops/action-gh-release
  dependency-version: 2.2.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…3.0 (linkerd#506)

Bumps [github.com/containernetworking/cni](https://github.com/containernetworking/cni) from 1.2.3 to 1.3.0.
- [Release notes](https://github.com/containernetworking/cni/releases)
- [Commits](containernetworking/cni@v1.2.3...v1.3.0)

---
updated-dependencies:
- dependency-name: github.com/containernetworking/cni
  dependency-version: 1.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/cache](https://github.com/actions/cache) from 4.2.2 to 4.2.3.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](actions/cache@d4323d4...5a3ec84)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [docker/login-action](https://github.com/docker/login-action) from 3.3.0 to 3.4.0.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](docker/login-action@9780b0c...74a5d14)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [DavidAnson/markdownlint-cli2-action](https://github.com/davidanson/markdownlint-cli2-action) from 19.1.0 to 20.0.0.
- [Release notes](https://github.com/davidanson/markdownlint-cli2-action/releases)
- [Commits](DavidAnson/markdownlint-cli2-action@05f3221...992badc)

---
updated-dependencies:
- dependency-name: DavidAnson/markdownlint-cli2-action
  dependency-version: 20.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps alpine from 3.21.3 to 3.22.0.

---
updated-dependencies:
- dependency-name: alpine
  dependency-version: 3.22.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…inkerd#527)

Bumps [actions/checkout](https://github.com/actions/checkout) from 3df4ab11eba7bda6032a0b82a6bb43b11571feac to 09d2acae674a48949e3602304ab46fd20ae0c42f.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@3df4ab1...09d2aca)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '09d2acae674a48949e3602304ab46fd20ae0c42f'
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@sdickhoven sdickhoven force-pushed the BUG-fix-race-condition branch from 332dc5c to 6c19648 Compare June 11, 2025 13:01
@sdickhoven
Copy link
Contributor Author

sdickhoven commented Jun 11, 2025

ok. that seems to have worked. 😌

now that i know how to edit past commits, would you like me to also prepend my commit messages with e.g.

fix(linkerd-cni): ...

?

...though that would require another force-push.

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, LGTM! -- Tested fine in EKS with aws-cni and chained cilium-cni 👍

@alpeb alpeb merged commit cc02f36 into linkerd:main Jun 11, 2025
10 checks passed
@sdickhoven
Copy link
Contributor Author

hi @alpeb 👋

any idea when you are going to create a new release / docker image for the cni plugin with the updated install-cni.sh?

@cratelyn
Copy link
Member

cratelyn commented Jul 2, 2025

hi @alpeb 👋

any idea when you are going to create a new release / docker image for the cni plugin with the updated install-cni.sh?

hey @sdickhoven just a heads up that @alpeb is out of the office this week, but we can likely create a new release once he is back.

@sdickhoven
Copy link
Contributor Author

hey @sdickhoven just a heads up that @alpeb is out of the office this week, but we can likely create a new release once he is back.

thanks, @cratelyn 🙏

no rush. ☺️

i just wanted to get an idea of roughly when we can expect a release. that's all.

happy to wait until @alpeb is back from vacation (and longer too). 🏝️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants