Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 10 additions & 14 deletions docs/dev/settings.rst
Original file line number Diff line number Diff line change
@@ -1,23 +1,19 @@
Interesting settings
====================

DOCKER_LIMITS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is backwards incompatible for local installs. We might want to bump a major version, or build a translation layer from the old setting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine bumping a major version. I don't think people is really changing this value, tho. The default should be enough for most of the development use cases.

-------------
BUILD_MEMORY_LIMIT
------------------

A dictionary of limits to virtual machines. These limits include:
The maximum memory allocated to the virtual machine.
If this limit is hit, build processes will be automatically killed.
Examples: '200m' for 200MB of total memory, or '2g' for 2GB of total memory.

time
An integer representing the total allowed time limit (in
seconds) of build processes. This time limit affects the parent
process to the virtual machine and will force a virtual machine
to die if a build is still running after the allotted time
expires.
BUILD_TIME_LIMIT
----------------

memory
The maximum memory allocated to the virtual machine. If this
limit is hit, build processes will be automatically killed.
Examples: '200m' for 200MB of total memory, or '2g' for 2GB of
total memory.
An integer representing the total allowed time limit (in seconds) of build processes.
This time limit affects the parent process to the virtual machine and will force a virtual machine
to die if a build is still running after the allotted time expires.

PRODUCTION_DOMAIN
------------------
Expand Down
3 changes: 2 additions & 1 deletion readthedocs/api/v2/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ def create_key(self, project):
Build API keys are valid for 3 hours,
and can be revoked at any time by hitting the /api/v2/revoke/ endpoint.
"""
expiry_date = timezone.now() + timedelta(hours=3)
# Use the build time limit + 15% for the API token
expiry_date = timezone.now() + timedelta(seconds=settings.BUILD_TIME_LIMIT * 1.25)
name_max_length = self.model._meta.get_field("name").max_length
return super().create_key(
# Name is required, so we use the project slug for it.
Expand Down
13 changes: 5 additions & 8 deletions readthedocs/core/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,18 +86,15 @@ def prepare_build(
options["queue"] = project.build_queue

# Set per-task time limit
# TODO remove the use of Docker limits or replace the logic here. This
# was pulling the Docker limits that were set on each stack, but we moved
# to dynamic setting of the Docker limits. This sets a failsafe higher
# limit, but if no builds hit this limit, it should be safe to remove and
# rely on Docker to terminate things on time.
# time_limit = DOCKER_LIMITS['time']
time_limit = 7200
Comment on lines -89 to -95
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to this we were allowing 2h builds on .org and .com. However, our docs says we allow 15m and 30m by default respectively.

Changing this will have an impact on projects that were abusing our platform, but also on some valid projects.

I checked this and we have 227 project with >15m successful builds that didn't ask support for this extra time:

In [22]: len(set(Build.objects.filter(date__gt=timezone.now() - timezone.timedelta(days=90), length__gt=60*15, success=True, project__container_time_limit__isnull=True).values_list("project__slug", flat=True)))
Out[22]: 227

We need to decide what to do with them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an average of 40 build instances in the last year. This may explain why this number has been being that high...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was introduced in 2020 by this commit 2e2c6f4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely going to be a pretty big breaking change. We should have a plan here, since otherwise we will just get destroyed with support messages from these projects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the time limit for these projects using a script https://gist.github.com/humitos/d247e99f93ebdd8624d06d663096edbc

time_limit = settings.BUILD_TIME_LIMIT
try:
if project.container_time_limit:
time_limit = int(project.container_time_limit)
except ValueError:
log.warning("Invalid time_limit for project.")
log.warning(
"Invalid time_limit for project.",
time_limit=project.container_time_limit,
)

# Add 20% overhead to task, to ensure the build can timeout and the task
# will cleanly finish.
Expand Down
1 change: 0 additions & 1 deletion readthedocs/doc_builder/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
DOCKER_SOCKET = settings.DOCKER_SOCKET
DOCKER_VERSION = settings.DOCKER_VERSION
DOCKER_IMAGE = settings.DOCKER_IMAGE
DOCKER_LIMITS = settings.DOCKER_LIMITS
DOCKER_TIMEOUT_EXIT_CODE = 42
DOCKER_OOM_EXIT_CODE = 137

Expand Down
5 changes: 2 additions & 3 deletions readthedocs/doc_builder/environments.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@

from .constants import DOCKER_HOSTNAME_MAX_LEN
from .constants import DOCKER_IMAGE
from .constants import DOCKER_LIMITS
from .constants import DOCKER_OOM_EXIT_CODE
from .constants import DOCKER_SOCKET
from .constants import DOCKER_TIMEOUT_EXIT_CODE
Expand Down Expand Up @@ -581,8 +580,8 @@ class DockerBuildEnvironment(BaseBuildEnvironment):

command_class = DockerBuildCommand
container_image = DOCKER_IMAGE
container_mem_limit = DOCKER_LIMITS.get("memory")
container_time_limit = DOCKER_LIMITS.get("time")
container_mem_limit = settings.BUILD_MEMORY_LIMIT
container_time_limit = settings.BUILD_TIME_LIMIT

def __init__(self, *args, **kwargs):
container_image = kwargs.pop("container_image", None)
Expand Down
10 changes: 3 additions & 7 deletions readthedocs/projects/tasks/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,17 +162,13 @@ def finish_inactive_builds():

A build is consider inactive if it's not in a final state and it has been
"running" for more time that the allowed one (``Project.container_time_limit``
or ``DOCKER_LIMITS['time']`` plus a 20% of it).
or ``BUILD_TIME_LIMIT`` plus a 20% of it).

These inactive builds will be marked as ``success`` and ``CANCELLED`` with an
``error`` to be communicated to the user.
"""
# TODO similar to the celery task time limit, we can't infer this from
# Docker settings anymore, because Docker settings are determined on the
# build servers dynamically.
# time_limit = int(DOCKER_LIMITS['time'] * 1.2)
# Set time as maximum celery task time limit + 5m
time_limit = 7200 + 300
# TODO: delete this task once we are fully migrated to ``BUILD_HEALTHCHECK``
time_limit = settings.BUILD_TIME_LIMIT * 1.2
delta = datetime.timedelta(seconds=time_limit)
query = (
~Q(state__in=BUILD_FINAL_STATES)
Expand Down
45 changes: 18 additions & 27 deletions readthedocs/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -605,7 +605,7 @@ def TEMPLATES(self):
# https://github.com/readthedocs/readthedocs.org/issues/12317#issuecomment-3070950434
# https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html#visibility-timeout
BROKER_TRANSPORT_OPTIONS = {
'visibility_timeout': 18000, # 5 hours
'visibility_timeout': self.BUILD_TIME_LIMIT * 1.15, # 15% more than the build time limit
}

CELERY_DEFAULT_QUEUE = "celery"
Expand Down Expand Up @@ -721,7 +721,13 @@ def TEMPLATES(self):
# since we can't read their config file image choice before cloning
RTD_DOCKER_CLONE_IMAGE = RTD_DOCKER_BUILD_SETTINGS["os"]["ubuntu-22.04"]

def _get_docker_memory_limit(self):
def _get_build_memory_limit(self):
"""
Return the buld memory limit based on available system memory.

We subtract ~1000Mb for overhead of processes and base system, and set
the build time as proportional to the memory limit.
"""
try:
total_memory = int(
subprocess.check_output(
Expand All @@ -735,46 +741,31 @@ def _get_docker_memory_limit(self):
# int and raise a ValueError
log.exception("Failed to get memory size, using defaults Docker limits.")

# Coefficient used to determine build time limit, as a percentage of total
# memory. Historical values here were 0.225 to 0.3.
DOCKER_TIME_LIMIT_COEFF = 0.25
BUILD_TIME_LIMIT = 900 # seconds

@property
def DOCKER_LIMITS(self):
def BUILD_MEMORY_LIMIT(self):
"""
Set docker limits dynamically, if in production, based on system memory.
Set build memory limit dynamically, if in production, based on system memory.

We do this to avoid having separate build images. This assumes 1 build
process per server, which will be allowed to consume all available
memory.

We subtract 750MiB for overhead of processes and base system, and set
the build time as proportional to the memory limit.
"""
# Our normal default
limits = {
"memory": "2g",
"time": 900,
}
default_memory_limit = "7g"

# Only run on our servers
if self.RTD_IS_PRODUCTION:
total_memory, memory_limit = self._get_docker_memory_limit()
if memory_limit:
limits = {
"memory": f"{memory_limit}m",
"time": max(
limits["time"],
round(total_memory * self.DOCKER_TIME_LIMIT_COEFF, -2),
),
}
total_memory, memory_limit = self._get_build_memory_limit()

memory_limit = memory_limit or default_memory_limit
log.info(
"Using dynamic docker limits.",
"Using dynamic build limits.",
hostname=socket.gethostname(),
memory=limits["memory"],
time=limits["time"],
memory=memory_limit,
)
return limits
return memory_limit

# Allauth
ACCOUNT_ADAPTER = "readthedocs.core.adapters.AccountAdapter"
Expand Down
2 changes: 1 addition & 1 deletion readthedocs/settings/docker_compose.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ class DockerBaseSettings(CommunityBaseSettings):
RTD_DOCKER_COMPOSE_NETWORK = "community_readthedocs"
RTD_DOCKER_COMPOSE_VOLUME = "community_build-user-builds"
RTD_DOCKER_USER = f"{os.geteuid()}:{os.getegid()}"
DOCKER_LIMITS = {"memory": "2g", "time": 900}
BUILD_MEMORY_LIMIT = "2g"

PRODUCTION_DOMAIN = os.environ.get("RTD_PRODUCTION_DOMAIN", "devthedocs.org")
PUBLIC_DOMAIN = os.environ.get("RTD_PUBLIC_DOMAIN", "devthedocs.org")
Expand Down
3 changes: 2 additions & 1 deletion readthedocs/settings/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ class CommunityTestSettings(CommunityBaseSettings):
CELERY_ALWAYS_EAGER = True

# Skip automatic detection of Docker limits for testing
DOCKER_LIMITS = {"memory": "200m", "time": 600}
BUILD_TIME_LIMIT = 600
BUILD_MEMORY_LIMIT = "200m"

CACHES = {
"default": {
Expand Down