-
Notifications
You must be signed in to change notification settings - Fork 32
OCPBUGS-38759: client: re-use a single file for building the session instead of randomly named files #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@chrischdi: This pull request references Jira Issue OCPBUGS-38759, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
||
var ( | ||
sharedCredentialsFileMutex sync.Mutex | ||
sharedCredentialsFileName = path.Join(os.TempDir(), "aws-shared-credentials"+rand.String(16)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can rand.String
output characters which are invalid in a path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only uses bcdfghjklmnpqrstvwxz2456789
// We omit vowels from the set of available characters to reduce the chances
// of "bad words" being formed.
alphanums = "bcdfghjklmnpqrstvwxz2456789"
So no :-)
} | ||
|
||
f, err := ioutil.TempFile("", "aws-shared-credentials") | ||
f, err := os.Create(sharedCredentialsFileName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still writing the file every time, even if it now has the same filename.
If this is enough to stem the bleeding then fine, I guess, but we should put a big comment here explaining why we're not using ioutil.TempFile() and why we shouldn't put it back, preferrably with a link to a RHEL bug.
However, ideally we would return the same file if the contents are the same. How about we give the file some name which is unique to the secret's contents and don't create it if it already exists? A couple of ideas:
- aws-shared-credentials-[hash of secret contents]
- aws-shared-credentials-[secret.name]-[secret.uid]-[secret.generation]
The latter seems pretty simple to write. A downside is that over time we accumulate every version of the credentials we have ever seen in the temp directory, but I suspect that would never be an issue in practice and the only actor who can exploit it has no incentive to DoS themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still writing the file every time, even if it now has the same filename.
If this is enough to stem the bleeding then fine, I guess, but we should put a big comment here explaining why we're not using ioutil.TempFile() and why we shouldn't put it back, preferrably with a link to a RHEL bug.
Going to add a comment 👍 agree.
However, ideally we would return the same file if the contents are the same. How about we give the file some name which is unique to the secret's contents and don't create it if it already exists? A couple of ideas:
- aws-shared-credentials-[hash of secret contents]
- aws-shared-credentials-[secret.name]-[secret.uid]-[secret.generation]
The latter seems pretty simple to write. A downside is that over time we accumulate every version of the credentials we have ever seen in the temp directory, but I suspect that would never be an issue in practice and the only actor who can exploit it has no incentive to DoS themselves.
I'd prefer to not cache the credentials on disk. If we want to do some caching instead, we might want to cache the session object in memory and not need the whole write/read/delete cycle at all. But I think this goes too far for this PR.
pkg/client/client.go
Outdated
if len(sessionOptions.SharedConfigFiles) > 0 { | ||
os.Remove(sessionOptions.SharedConfigFiles[0]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not deleting these files in all error paths. Is it possible we're just leaving a lot of files in tmpfs?
Hmm, probably not, given that your reproducer doesn't do this. However, we should probably still fix it if we're going to continue writing these files on ever call. I recommend this should:
- Be called from a defer() immediately after calling
sharedCredentialsFileFromSecret()
- (and be implemented as an unconditional for loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, PTAL
/test e2e-aws-serial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/jira refresh |
@chrischdi: This pull request references Jira Issue OCPBUGS-38759, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/test okd-scos-e2e-aws-ovn |
/test e2e-aws-serial 4h timeout |
/test e2e-aws-serial |
@chrischdi I can see the serial has passed but failed on the gather phase due to timeout. I think it might be necessary to shard it (see: openshift/release#64853 for an example) |
@huali9 can you help to test this? thanks! |
/assign @mdbooth Matt are you happy with this? TY |
/test help |
@chrischdi: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-ci-openshift-machine-api-provider-aws-main-e2e-aws-serial-1of2 |
@chrischdi: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-ci-openshift-machine-api-provider-aws-main-e2e-aws-serial-1of2 |
@chrischdi: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
80f90e1
to
36e2ff3
Compare
New changes are detected. LGTM label has been removed. |
(rebased and squashed os it takes the new 1-of-2/2-of-2 jobs for serial to make them succeed... |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test e2e-aws |
@chrischdi: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The controller was for each call of
newAWSSession
doing the following:/tmp
) with a prefixaws-shared-credentials
.session.NewSessionWithOptions
This started to fill up the slab caches indefinitely and thus leading to unnecessary memory usage.
While the kernel should probably free that memory, we can prevent even getting into that situation by now:
Covering with a mutex has the negative point of not being concurrent, but we only have two controllers, both running with a concurrency of 1.