Skip to content

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Oct 13, 2025

Move start_doing_background_updates() to HomeServer.start_background_tasks()

Follow-up to #19036 (reverted in #19059) where I incorrectly had this in SynapseHomeServer.start_background_tasks(). This was flawed because if run_background_tasks_on is configured to something other than main, we will never call SynapseHomeServer.start_background_tasks() specifically (SynapseHomeServer is for the main Synapse instance), and instead only call HomeServer.start_background_tasks() (HomeServer is for main and worker Synapse instances).

if hs.config.worker.run_background_tasks:
hs.start_background_tasks()

Our goal is to run start_doing_background_updates() on whatever homeserver is configured to run_background_tasks_on.

Dev notes

From #18886

# This will cause all of the relevant storage classes to be instantiated and call
# `register_background_update_handler(...)`,
# `register_background_index_update(...)`,
# `register_background_validate_constraint(...)`, etc so they are available to use
# if we are asked to run those background updates.
hs.get_storage_controllers()

homeserver_to_use=GenericWorkerServer

synapse.storage.prepare_database.UpgradeDatabaseException: Uninitialised database: run the main synapse process to prepare the database schema before starting worker processes.

-> #15671 (matrix-org/synapse#15671)

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.replication.test_federation_ack.FederationAckTestCase.test_federation_ack_sent
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.app.test_openid_listener.FederationReaderOpenIDListenerTests
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_rollback_worker.WorkerSchemaTests

run_background_tasks_on

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_presence.PresenceHandlerInitTestCase

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

@@ -1 +1 @@
Move `start_doing_background_updates()` to `SynapseHomeServer.start_background_tasks()`.
Move `start_doing_background_updates()` to `HomeServer.start_background_tasks()`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the changelog from #19036 to merge with this one.

self.get_task_scheduler()
self.get_common_usage_metrics_manager().setup()
start_phone_stats_home(self)
self.get_datastores().main.db_pool.updates.start_doing_background_updates()
Copy link
Contributor Author

@MadLittleMods MadLittleMods Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To call this out again; before #19036 (comment), start_doing_background_updates() only ran on the main Synapse instance.

With this change, it runs on whatever instance is configured to run_background_tasks_on.

Based on the previous discussion, we thought this was fine. I just didn't have this in the correct spot to match that description.

See #19057 (comment)

Running any `HomeserverTestCase` results in:
```
$ SYNAPSE_TEST_LOG_LEVEL=DEBUG poetry run trial tests.rest.client.sliding_sync.test_sliding_sync.SlidingSyncTestCase_new.test_sync_list

...

builtins.AssertionError: Expected `call_later` callback from the reactor to start with the sentinel logcontext but saw run_bg_updates. In other words, another task shouldn't have leaked their logcontext to us.
```
This already happens as part of `HomeServer.start_background_tasks()`
Comment on lines 680 to 696
async def run_bg_updates() -> None:
with LoggingContext(name="run_bg_updates", server_name=server_name):
self.get_success(stor.db_pool.updates.run_background_updates(False))

hs = setup_test_homeserver(
cleanup_func=self.addCleanup,
server_name=server_name,
config=config_obj,
reactor=reactor,
clock=clock,
**extra_homeserver_attributes,
)
stor = hs.get_datastores().main

# Run the database background updates, when running against "master".
if hs.__class__.__name__ == "TestHomeServer":
self.get_success(run_bg_updates())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to duplicate running the database background updates here now that we do it in HomeServer.start_background_tasks().

We've now updated to just wait for the database background updates to finish which ensures that the homeserver is starting and driving the background to completion as part of its normal process. (less snowflake behavior in tests)


async def run_bg_updates() -> None:
with LoggingContext(name="run_bg_updates", server_name=server_name):
self.get_success(stor.db_pool.updates.run_background_updates(False))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difference to note here is that we're running with run_background_updates(False) which sets argument sleep: False and determines:

sleep: Whether to limit how quickly we run background updates or
not.

Versus now where we run this via start_doing_background_updates() and sleep is configured via hs.config.background_updates.sleep_enabled and default to True. This means that we're potentially running slower in tests now. Will post an update to compare before/after to see if it even matters.

def start_doing_background_updates(self) -> None:
if self.enabled:
# if we start a new background update, not all updates are done.
self._all_done = False
sleep = self.sleep_enabled
self.hs.run_as_background_process(
"background_updates",
self.run_background_updates,
sleep,
)

store = self.hs.get_datastores().main
self._wait_for_background_updates(self.hs)

def _wait_for_background_updates(self, hs: HomeServer) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer use this variant but I think it's useful to reference in the comment down below.

```
builtins.AssertionError: Expected `looping_call` callback from the reactor to start with the sentinel logcontext but saw task-_resumable_task-0-IBzAmHUoepQfLnEA. In other words, another task shouldn't have leaked their logcontext to us.
```
# Mock PaginationHandler.purge_room to sleep for 100s, so we have time to do a second call
# before the purge is over. Note that it doesn't purge anymore, but we don't care.
async def purge_room(room_id: str, force: bool) -> None:
await deferLater(self.hs.get_reactor(), 100, lambda: None)
Copy link
Contributor Author

@MadLittleMods MadLittleMods Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these deferLater changes have been split off to #19058 so we could merge that first in order to clean up the diff here.

They are changed here because we started seeing these failures in CI with this PR for some reason:

Example failure: https://github.com/element-hq/synapse/actions/runs/18477454390/job/52645183606?pr=19057

builtins.AssertionError: Expected `looping_call` callback from the reactor to start with the sentinel logcontext but saw task-_resumable_task-0-IBzAmHUoepQfLnEA. In other words, another task shouldn't have leaked their logcontext to us.

@@ -1 +1 @@
Move `start_doing_background_updates()` to `SynapseHomeServer.start_background_tasks()`.
Move `start_doing_background_updates()` to `HomeServer.start_background_tasks()`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PresenceHandlerInitTestCase failures ❌

PresenceHandlerInitTestCase purposely tries to disable background tasks to keep PresenceHandler uninitialized until the test is prepared. It tries to test that "presence state restored from the database [when PresenceHandler is initialized] should not persist forever".

This conflicts with the changes currently being made where we expect the background tasks to run because we want the database background tasks to complete before we start the tests.

class PresenceHandlerInitTestCase(unittest.HomeserverTestCase):
def default_config(self) -> JsonDict:
config = super().default_config()
# Disable background tasks on this worker so that the PresenceHandler isn't
# loaded until we request it.
config["run_background_tasks_on"] = "other"
return config

Example failures:

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_presence.PresenceHandlerInitTestCase
The "poetry.dev-dependencies" section is deprecated and will be removed in a future version. Use "poetry.group.dev.dependencies" instead.
tests.handlers.test_presence
  PresenceHandlerInitTestCase
    test_restored_presence_idles ...                                    [ERROR]
    test_restored_presence_online_after_sync_0_org_matrix_msc3026_busy ... [ERROR]
    test_restored_presence_online_after_sync_1_online ...               [ERROR]
    test_restored_presence_online_after_sync_2_unavailable ...          [ERROR]
    test_restored_presence_online_after_sync_3_offline ...              [ERROR]

===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 138, in new
    return code(orig, *args, **kwargs)
  File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 232, in setUp
    return orig()
  File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 404, in setUp
    self.hs = self.make_homeserver(self.reactor, self.clock)
  File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 525, in make_homeserver
    hs = self.setup_test_homeserver(reactor=reactor, clock=clock)
  File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 733, in setup_test_homeserver
    raise AssertionError(
builtins.AssertionError: Timed out waiting for background updates to complete (5s). Did you forget to `start_doing_background_updates()`?

tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_idles
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_0_org_matrix_msc3026_busy
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_1_online
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_2_unavailable
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_3_offline
-------------------------------------------------------------------------------
Ran 5 tests in 25.208s

FAILED (errors=5)

@@ -1 +1 @@
Move `start_doing_background_updates()` to `SynapseHomeServer.start_background_tasks()`.
Move `start_doing_background_updates()` to `HomeServer.start_background_tasks()`.
Copy link
Contributor Author

@MadLittleMods MadLittleMods Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database isn't setup for tests that use GenericWorkerServer

We have a few tests that use homeserver_to_use=GenericWorkerServer and are failing with builtins.AssertionError: Timed out waiting for background updates to complete (5s). Did you forget to 'start_doing_background_updates()'? in CI (from our new assertion).

Error: 
Traceback (most recent call last):
  File "/home/runner/work/synapse/synapse/tests/unittest.py", line 138, in new
    return code(orig, *args, **kwargs)
  File "/home/runner/work/synapse/synapse/tests/unittest.py", line 232, in setUp
    return orig()
  File "/home/runner/work/synapse/synapse/tests/unittest.py", line 404, in setUp
    self.hs = self.make_homeserver(self.reactor, self.clock)
  File "/home/runner/work/synapse/synapse/tests/app/test_openid_listener.py", line 40, in make_homeserver
    hs = self.setup_test_homeserver(homeserver_to_use=GenericWorkerServer)
  File "/home/runner/work/synapse/synapse/tests/unittest.py", line 733, in setup_test_homeserver
    raise AssertionError(
builtins.AssertionError: Timed out waiting for background updates to complete (5s). Did you forget to `start_doing_background_updates()`?

tests.app.test_openid_listener.FederationReaderOpenIDListenerTests.test_openid_listener_0
tests.app.test_openid_listener.FederationReaderOpenIDListenerTests.test_openid_listener_1
tests.app.test_openid_listener.FederationReaderOpenIDListenerTests.test_openid_listener_2
tests.app.test_openid_listener.FederationReaderOpenIDListenerTests.test_openid_listener_3

...

tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_idles
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_0_org_matrix_msc3026_busy
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_1_online
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_2_unavailable
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_3_offline

...

tests.replication.test_federation_ack.FederationAckTestCase.test_federation_ack_sent

...

tests.storage.test_rollback_worker.WorkerSchemaTests.test_not_upgraded_current_schema_version_with_outstanding_deltas
tests.storage.test_rollback_worker.WorkerSchemaTests.test_not_upgraded_old_schema_version
tests.storage.test_rollback_worker.WorkerSchemaTests.test_rolling_back

Previously, we didn't wait for background updates in these cases and only ran the database background updates for the TestHomeServer (the default in tests). Basically turning a blind eye to this kind of problem.

synapse/tests/unittest.py

Lines 680 to 696 in ec7554b

async def run_bg_updates() -> None:
with LoggingContext(name="run_bg_updates", server_name=server_name):
self.get_success(stor.db_pool.updates.run_background_updates(False))
hs = setup_test_homeserver(
cleanup_func=self.addCleanup,
server_name=server_name,
config=config_obj,
reactor=reactor,
clock=clock,
**extra_homeserver_attributes,
)
stor = hs.get_datastores().main
# Run the database background updates, when running against "master".
if hs.__class__.__name__ == "TestHomeServer":
self.get_success(run_bg_updates())

Unable to run test by itself

Compounding things is that we have to deal with #15671 (previously matrix-org/synapse#15671) which prevents us from running these tests on their own so it's even harder to debug what's going on.

synapse.storage.prepare_database.UpgradeDatabaseException: Uninitialised database: run the main synapse process to prepare the database schema before starting worker processes.

MadLittleMods added a commit that referenced this pull request Oct 20, 2025
….start_background_tasks()` (#19036)" (#19059)

### Why

See
#19036 (comment)

Revert while I figure out the tests in
#19057
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant