-
Notifications
You must be signed in to change notification settings - Fork 450
OCPBUGS-54682: Fix - NetworkManager restart or crash renders br-ex unusable #5304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
5a59b6c
ofport-request dispatcher script takes br-ex(-br) managed by nmstate …
emy 28a1a75
improve of-port request so it uses a common path instead of retrying …
emy 5add825
Revert "improve of-port request so it uses a common path instead of r…
emy 1e8fc02
Revert "ofport-request dispatcher script takes br-ex(-br) managed by …
emy 0a6dfa5
add: fallback check for br-ex-br in ofport NM dispatcher script
emy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious how this works. If the connection uuid was incorrectly detected, do we not exit on line 46? Are we still getting a uuid there but it's not the correct one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no connection that matches both "br-ex" and "ovs-bridge" then
PORT_CONNECTION_UUID
will be empty, thus the next line fails and process exits with code 10 or some other big number.If there is only one connection that matches both "br-ex" and "ovs-bridge", then
PORT_CONNECTION_UUID
will get one value and it's a correct value. This is the scenario when everything works okay.If there are two or more connections that match both "br-ex" and "ovs-bridge", then
PORT_CONNECTION_UUID
will get a random value out of 2 or more possible ones. This scenario is bad, undesired. How can it happen that we end up in such a scenario? We would need to have 2 connections that have the samedevice
andtype
. Can it ever happen?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I realize now that you are asking about L46 which uses output of
but I do not see there how we can get an incorrect one easily. At least, in more scenarios than we do today. Let's start with
From that we get
So seems like
PORT_CONNECTION_UUID
is trying to find typeovs*
with the name that is your interface name (e.g.eth0
). For those it's okay because we will not have multipleovs*
with such a name.It could get tricky when you have the run with
INTERFACE_NAME=br-ex
, but that one finishes very quickly, i.e.So, do we actually miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. The reason I'm asking is that we're recalculating the port UUID here with a slightly different command than the one above, but if the one above didn't find any UUID then the script would have exited before now. Which leads me to believe that either we get a different UUID from this command for some reason, or we don't need to recalculate it at all.
The latter would make this second call unnecessary, but it would still work fine so I'm mostly making sure I understand the logic correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair.
PORT_CONNECTION_UUID
previously, we can't reach this code here.PORT_CONNECTION_UUID
previously, we recalculate it here but we don't need to.PORT_CONNECTION_UUID
previously, it's actually badI have a gut feeling we are in the scenario (3). Look that the previous
PORT_CONNECTION_UUID
, it only matches forovs
and not forovs-bridge
. Given that forbr-ex*
we have more than one entry matchingovs*
, the way of calculating it here is more robust than the way of calculating it the old way.Maybe we should just move
PORT_CONNECTION_UUID=$(
from there up to L44 ? As I read it, this should work for both old and new way of definingbr-ex
. L44 works correctly for old method and may(?) race for the new one.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure I ended up in scenario (3). I'll check back and I agree that we could/should make this a little more solid for cases where a wrong selection could happen.