Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

sdickhoven
Copy link
Contributor

@sdickhoven sdickhoven commented Oct 15, 2025

this pr addresses a race condition in which the parent process outpaces the forked monitor_cni_config() process.

this race condition was observed in a production cluster and can clearly be seen in the log output:

Screenshot 2025-10-14 at 11 38 17

as can be seen above, the inotifywait is about 140ms late to the party.

the result is that existing cni config files are not patched which then causes the repair controller to incessantly "murder" pods that start up on the doomed worker node.

i should add that we've been running linkerd-cni v1.6.3 (which includes the fix for previously encountered race conditions) since 16 jul 2025 and v1.6.4 since 15 sep 2025 without a problem. yesterday was the first time we've seen this (new) race condition.

…onfig()` process

Signed-off-by: Simon Dickhoven <sdickhoven@everquote.com>
@sdickhoven sdickhoven requested a review from a team as a code owner October 15, 2025 18:08
Comment on lines +355 to +358
monitor_state=$(
(ps --ppid=$monitor_pid -o comm=,state= || true) |
awk '$1 == "inotifywait" && $2 == "S" {print "ok"}'
)
Copy link
Contributor Author

@sdickhoven sdickhoven Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the monitor_state variable will either be an empty string or it will contain (one or more) "ok".

...it should only be one "ok" because there is only one inotifywait process that gets spawned by the monitor_cni_config() function, but i wanted to make it very difficult for future changes to break this logic.

therefore, this logic will continue to work as expected even if multiple inotifywait processes are spawned by monitor_cni_config().

the subsequent check that is responsible for breaking the infinite loop only cares about whether the monitor_state variable is an empty string or not. so anything besides "ok" would work just as well.

i also experimented with only using ps here:

$ ps --ppid=$monitor_pid -C inotifywait -o state=

however it appears as though ps does not like multiple filters (1: --ppid=$monitor_pid, 2: -C inotifywait) and therefore ignores one of the two.

so i opted to have ps emit all processes that are spawned by monitor_cni_config() (which should be only inotifywait) and then have awk check if one of those processes is inotifywait and whether that process is in the state S (="interruptible sleep") which should indicate that it is not in the process of actively starting up but is instead waiting for filesystem events to come in.

https://www.man7.org/linux/man-pages/man1/ps.1.html#PROCESS_STATE_CODES

the = after the output column name (comm=,state=) tells ps to omit the header for that column. this is not strictly necessary for the logic to work but we don't need the headers so i added it.

$ ps -o comm,state
COMMAND         S
bash            S
ps              R

vs

$ ps -o comm=,state=
bash            S
ps              R

the || true is there because if ps doesn't find any processes (which is the whole reason for this logic to exist in the first place), it exits with a non-zero exit code (which would then abort the entire script... which we don't want).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could get much more elaborate here and use the filesystem for ipc between the main process and the forked monitor_cni_config() process but that would require a lot more logic.

so i opted to keep things relatively straight-forward.

awk '$1 == "inotifywait" && $2 == "S" {print "ok"}'
)
[ -z "$monitor_state" ] || break
sleep .1 # 100ms
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that i saw a ~140ms race in production, i figured that waiting in increments of 100ms is reasonable.

@sdickhoven
Copy link
Contributor Author

sdickhoven commented Oct 15, 2025

i tested the logic with the following script

#!/usr/bin/env bash

set -e

#inotifywait -m /tmp &

myfunc() {
  sleep 2
  inotifywait -m .
}

myfunc &
monitor_pid=$!

while true; do
  monitor_state=$(
    (ps --ppid=$monitor_pid -o comm=,state= || true) |
    awk '$1 == "inotifywait" && $2 == "S" {print "ok"}'
  )
  echo "tick $monitor_state"
  [ -z "$monitor_state" ] || break
  sleep .1 # 100ms
done

echo done

wait -n

i also threw in a second inotifywait (commented out above) to verify that only the inotifywait i care about is looked at by my logic.

it is also possible to see the new logic in action by inserting a sleep 2 just before this line (which i also did during testing):

https://github.com/linkerd/linkerd2-proxy-init/blob/cni-plugin/v1.6.4/cni-plugin/deployment/scripts/install-cni.sh#L295

@olix0r
Copy link
Member

olix0r commented Oct 17, 2025

Thanks for the very detailed overview. We should be able to take a deeper look at this next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.