-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Build: use only one setting for build time limit #12369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Documentation build overviewFiles changed
Show files (1) | 1 modified | 0 added | 0 deleted
|
# TODO remove the use of Docker limits or replace the logic here. This | ||
# was pulling the Docker limits that were set on each stack, but we moved | ||
# to dynamic setting of the Docker limits. This sets a failsafe higher | ||
# limit, but if no builds hit this limit, it should be safe to remove and | ||
# rely on Docker to terminate things on time. | ||
# time_limit = DOCKER_LIMITS['time'] | ||
time_limit = 7200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to this we were allowing 2h builds on .org and .com. However, our docs says we allow 15m and 30m by default respectively.
Changing this will have an impact on projects that were abusing our platform, but also on some valid projects.
I checked this and we have 227 project with >15m successful builds that didn't ask support for this extra time:
In [22]: len(set(Build.objects.filter(date__gt=timezone.now() - timezone.timedelta(days=90), length__gt=60*15, success=True, project__container_time_limit__isnull=True).values_list("project_
⋮ _slug", flat=True)))
Out[22]: 227
We need to decide what to do with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an average of 40 build instances in the last year. This may explain why this number has been being that high...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was introduced in 2020 by this commit 2e2c6f4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is definitely going to be a pretty big breaking change. We should have a plan here, since otherwise we will just get destroyed with support messages from these projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the time limit for these projects using a script https://gist.github.com/humitos/d247e99f93ebdd8624d06d663096edbc
@@ -1,23 +1,19 @@ | |||
Interesting settings | |||
==================== | |||
|
|||
DOCKER_LIMITS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is backwards incompatible for local installs. We might want to bump a major version, or build a translation layer from the old setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine bumping a major version. I don't think people is really changing this value, tho. The default should be enough for most of the development use cases.
time_limit = int(project.container_time_limit) | ||
except ValueError: | ||
log.warning("Invalid time_limit for project.") | ||
time_limit = project.container_time_limit or settings.BUILD_TIME_LIMIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why the old logic was so defensive here? Seems like it was probably there for a reason. We aren't even casting it to an integer anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I checked this. The field is an IntegerField
so it cannot be anything different than an integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [22]: set(Project.objects.filter(container_time_limit__isnull=False).values_list("container_time_limit", flat=True))
Out[22]:
{60,
950,
1000,
1200,
1500,
1600,
1800,
2000,
2400,
2500,
2700,
3000,
3200,
3300,
3600,
4200,
5400,
6300,
7200,
10800,
31200,
108000}
@@ -605,7 +632,7 @@ def TEMPLATES(self): | |||
# https://github.com/readthedocs/readthedocs.org/issues/12317#issuecomment-3070950434 | |||
# https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html#visibility-timeout | |||
BROKER_TRANSPORT_OPTIONS = { | |||
'visibility_timeout': 18000, # 5 hours | |||
'visibility_timeout': BUILD_TIME_LIMIT * 1.15, # 15% more than the build time limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using buffer of 15%, 20%, and 25% in different places. Should they be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. These are different on purpose.
The task should be we lowest, then visibility timeout and then build API token. We still need to communicate with API after the task got killed, so we can update it.
Just a note: I pointed out some issues with the commits from this PR in #12393, we should probably return to resolve those issues if they are valid. |
When a customer/user requests longer builds, we set `container_time_limit` at project level. However, if the time is longer than 1h we also need to add `BUILD_NO_ACKS_LATE` to that project. This PR checks for that limit and if it's longer than 1h adds `acks_late=False` to the build task. Related to #12369
Closes #12347
📚 Documentation previews 📚
docs
): https://docs--12369.org.readthedocs.build/12369/dev
): https://dev--12369.org.readthedocs.build/12369/