diff --git a/docs/dev/celery.rst b/docs/dev/celery.rst new file mode 100644 index 00000000000..e1597f834c7 --- /dev/null +++ b/docs/dev/celery.rst @@ -0,0 +1,169 @@ +Celery Tasks +============ + +Read the Docs uses `Celery `_ for asynchronous task processing. +Tasks are distributed across different queues and handle various operations from building documentation +to sending notifications. + +Task Queues +----------- + +Tasks are organized into several queues: + +* ``web`` - General web-related tasks (webhooks, notifications, search indexing) +* ``build:default`` - Standard documentation builds +* ``build:large`` - Builds for projects using conda or requiring more resources +* ``autoscaling`` - Metrics collection tasks (in readthedocs-ext) + +Task Organization +----------------- + +Tasks are organized by functional area: + +**Build Tasks** (``readthedocs/builds/tasks.py``) + Core build management tasks including archiving, version sync, and notifications. + +**Project Build Tasks** (``readthedocs/projects/tasks/builds.py``) + Main documentation building tasks: + + * ``sync_repository_task`` - Syncs repository branches/tags + * ``update_docs_task`` - Builds documentation (main entry point) + +**OAuth Tasks** (``readthedocs/oauth/tasks.py``) + VCS integration tasks for syncing repositories and handling webhooks. + +**Search Tasks** (``readthedocs/search/tasks.py``) + Elasticsearch indexing and search query analytics. + +**Analytics Tasks** (``readthedocs/analytics/tasks.py``) + Page view analytics and data retention. + +**Core Tasks** (``readthedocs/core/tasks.py``) + Email sending, Redis cleanup, and object deletion. + +Key Tasks +--------- + +update_docs_task +~~~~~~~~~~~~~~~~ + +The main entry point for building documentation. This task: + +* Clones the repository +* Sets up the build environment (Docker or local) +* Installs dependencies +* Builds documentation in multiple formats +* Uploads artifacts to storage +* Handles failures and retries + +It uses the ``UpdateDocsTask`` base class which provides: + +* Automatic retries for ``BuildMaxConcurrencyError`` +* Comprehensive error handling +* Build state tracking +* Notification sending + +sync_versions_task +~~~~~~~~~~~~~~~~~~ + +Syncs version data from the VCS repository to the database: + +* Creates new ``Version`` objects for tags/branches +* Deletes versions that no longer exist in the repository +* Runs automation rules +* Updates stable/latest version pointers +* Triggers builds for new stable versions + +send_build_status +~~~~~~~~~~~~~~~~~ + +Sends build status to GitHub/GitLab for pull requests and commits. +Integrates with OAuth services to post commit statuses. + +Custom Task Base Classes +------------------------- + +PublicTask +~~~~~~~~~~ + +Located in ``readthedocs/core/utils/tasks/public.py``. + +Base class for tasks with publicly viewable status. Features: + +* Permission checking via ``check_permission`` decorator +* Public data exposure for UI consumption +* Graceful exception handling +* Task state updates + +Example usage: + +.. code-block:: python + + @PublicTask.permission_check(user_id_matches_or_superuser) + @app.task(queue="web", base=PublicTask) + def sync_remote_repositories(user_id): + # Task implementation + pass + +SyncRepositoryMixin +~~~~~~~~~~~~~~~~~~~ + +Provides common functionality for repository synchronization tasks, +including VCS operations and version syncing logic. + +Task Routing +------------ + +The ``TaskRouter`` class (``readthedocs/builds/tasks.py``) dynamically routes +build tasks to appropriate queues based on: + +* Project configuration (``build_queue`` setting) +* Conda usage (routes to ``build:large``) +* Project maturity (new projects get ``build:large``) +* External version builds (use same queue as default version) + +Best Practices +-------------- + +When creating new tasks: + +1. **Choose the right queue**: Use ``web`` for most tasks, ``build:*`` only for builds +2. **Set appropriate timeouts**: Use ``time_limit`` and ``soft_time_limit`` +3. **Handle exceptions**: Use ``throws`` tuple for expected exceptions +4. **Log context**: Use ``structlog.contextvars.bind_contextvars`` for structured logging +5. **Use locks for uniqueness**: Use ``memcache_lock`` for tasks that shouldn't run concurrently +6. **Bind tasks when needed**: Use ``bind=True`` to access ``self`` in the task + +Example task definition: + +.. code-block:: python + + @app.task( + queue="web", + bind=True, + max_retries=3, + default_retry_delay=60, + time_limit=300, + ) + def my_task(self, project_pk): + structlog.contextvars.bind_contextvars( + project_pk=project_pk, + ) + log.info("Starting task") + # Task implementation + pass + +Monitoring +---------- + +* Task execution is logged with structured logging (structlog) +* Build tasks report progress through state updates +* Metrics are collected via periodic tasks (readthedocs-ext) +* Failed tasks are tracked in Sentry + +See Also +-------- + +* `Celery Documentation `_ +* :doc:`install` - Setting up development environment +* :doc:`tests` - Testing Celery tasks diff --git a/docs/dev/index.rst b/docs/dev/index.rst index 79032ecf7e9..10a1434096c 100644 --- a/docs/dev/index.rst +++ b/docs/dev/index.rst @@ -26,6 +26,7 @@ or taking the open source Read the Docs codebase for your own custom installatio front-end i18n migrations + celery server-side-search search-integration aws-temporary-credentials