-
Notifications
You must be signed in to change notification settings - Fork 1k
Fix DNS resolution performance regression during cloud-init local #6665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Fixes DNS queries for IP addresses that cause 2+ minute boot delays, particularly with systemd 259+. Moves IP detection earlier in is_resolvable() and removes legacy DNS-dependent metadata URL. Fixes canonical#6641
holmanb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. Please make sure that you have read the contribution guide.
| metadata_urls = [ | ||
| "http://169.254.169.254", | ||
| "http://[fd00:ec2::254]", | ||
| "http://instance-data.:8773", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ec2 datasource is used by various other clouds besides just ec2 - and unfortunately not all clouds are known, so this change poses a risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is confusing. There should be a separate Data Source implementation for each provider. Even if some are same/similar to allow for future changes a cloud provider may implement.
That being said.
It is possible to override the metadata_urls ref: https://cloudinit.readthedocs.io/en/latest/reference/datasources/ec2.html
IMHO the default should be the ones that are provided by the named Data Source (in this case EC2). If this breaks other cloud providers that use the same path, then they should create an config setting for the metadata_url as pr. documentation to add the relevant metadata_urls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure that I understand: You are saying that breaking clouds is justified because the code is confusing and there is a workaround that involves manual modifications to the image, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DataSource is named DataSourceEC2.py - EC2 is explicitly referring to Amazon Web Services EC2 service (in fact "Amazon EC2" is a registered trademark). Thus it should IMHO adhere to what ever is standard for Amazon EC2 at the current point in time.
It is fair that other clouds have implemented similar things, but they should either have their own data sources OR there should be a generic data source (DataSourceGeneric,py). They should not rely on that Amazon EC2 keeps doing things the same way. What if EC2 changes fundamentally "tomorrow"?
This change in DataSourceEC2.py is not important and I'm happy to back it out.
The important change to resolve the issue is that the check for IP addresses is made earlier in is_resolveable() so we do not unnecessarily go into the "Detect DNS Redirection check" when we don't even have a proper network and just try to query the metadata service (the result from this query is used to setup the network).
|
I have approved the CLA. |
|
@drzee99 the CLA check is still failing. Please fix. |
This reverts commit 5ec0eae. Wrong email in commit.
|
I have tried to revert commit 5ec0eae which was submitted with the wrong email address, which is why CLA check failed. I have then tried to commit it back in this time with the correct user name and email set. I dont know if that fixes the workflow. |
Fix DNS resolution performance regression during cloud-init local
Summary
This PR addresses critical DNS resolution performance issues during the early
cloud-init localstage that cause boot delays of 2+ minutes, particularly with systemd version 259 and later.Problem
cloud-init localSolution
1. Optimize IP address handling in
util.py2. Remove legacy DNS-dependent URL from
DataSourceEc2.pyhttp://instance-data.:8773which is not in current AWS IMDS documentationChanges
is_resolvable()Testing
is_resolvable()Related Issues
Fixes #6641 - Systemd version 259 slows down DNS check during cloud-init local
Backward Compatibility