-
Notifications
You must be signed in to change notification settings - Fork 401
Move start_doing_background_updates()
to HomeServer.start_background_tasks()
#19057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Move start_doing_background_updates()
to HomeServer.start_background_tasks()
#19057
Conversation
@@ -1 +1 @@ | |||
Move `start_doing_background_updates()` to `SynapseHomeServer.start_background_tasks()`. | |||
Move `start_doing_background_updates()` to `HomeServer.start_background_tasks()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the changelog from #19036 to merge with this one.
self.get_task_scheduler() | ||
self.get_common_usage_metrics_manager().setup() | ||
start_phone_stats_home(self) | ||
self.get_datastores().main.db_pool.updates.start_doing_background_updates() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To call this out again; before #19036 (comment), start_doing_background_updates()
only ran on the main Synapse instance.
With this change, it runs on whatever instance is configured to run_background_tasks_on
.
Based on the previous discussion, we thought this was fine. I just didn't have this in the correct spot to match that description.
See #19057 (comment) Running any `HomeserverTestCase` results in: ``` $ SYNAPSE_TEST_LOG_LEVEL=DEBUG poetry run trial tests.rest.client.sliding_sync.test_sliding_sync.SlidingSyncTestCase_new.test_sync_list ... builtins.AssertionError: Expected `call_later` callback from the reactor to start with the sentinel logcontext but saw run_bg_updates. In other words, another task shouldn't have leaked their logcontext to us. ```
This already happens as part of `HomeServer.start_background_tasks()`
async def run_bg_updates() -> None: | ||
with LoggingContext(name="run_bg_updates", server_name=server_name): | ||
self.get_success(stor.db_pool.updates.run_background_updates(False)) | ||
|
||
hs = setup_test_homeserver( | ||
cleanup_func=self.addCleanup, | ||
server_name=server_name, | ||
config=config_obj, | ||
reactor=reactor, | ||
clock=clock, | ||
**extra_homeserver_attributes, | ||
) | ||
stor = hs.get_datastores().main | ||
|
||
# Run the database background updates, when running against "master". | ||
if hs.__class__.__name__ == "TestHomeServer": | ||
self.get_success(run_bg_updates()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to duplicate running the database background updates here now that we do it in HomeServer.start_background_tasks()
.
We've now updated to just wait for the database background updates to finish which ensures that the homeserver is starting and driving the background to completion as part of its normal process. (less snowflake behavior in tests)
|
||
async def run_bg_updates() -> None: | ||
with LoggingContext(name="run_bg_updates", server_name=server_name): | ||
self.get_success(stor.db_pool.updates.run_background_updates(False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One difference to note here is that we're running with run_background_updates(False)
which sets argument sleep: False
and determines:
synapse/synapse/storage/background_updates.py
Lines 526 to 527 in ec7554b
sleep: Whether to limit how quickly we run background updates or | |
not. |
Versus now where we run this via start_doing_background_updates()
and sleep
is configured via hs.config.background_updates.sleep_enabled
and default to True
. This means that we're potentially running slower in tests now. Will post an update to compare before/after to see if it even matters.
synapse/synapse/storage/background_updates.py
Lines 400 to 409 in ec7554b
def start_doing_background_updates(self) -> None: | |
if self.enabled: | |
# if we start a new background update, not all updates are done. | |
self._all_done = False | |
sleep = self.sleep_enabled | |
self.hs.run_as_background_process( | |
"background_updates", | |
self.run_background_updates, | |
sleep, | |
) |
store = self.hs.get_datastores().main | ||
self._wait_for_background_updates(self.hs) | ||
|
||
def _wait_for_background_updates(self, hs: HomeServer) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer use this variant but I think it's useful to reference in the comment down below.
``` builtins.AssertionError: Expected `looping_call` callback from the reactor to start with the sentinel logcontext but saw task-_resumable_task-0-IBzAmHUoepQfLnEA. In other words, another task shouldn't have leaked their logcontext to us. ```
# Mock PaginationHandler.purge_room to sleep for 100s, so we have time to do a second call | ||
# before the purge is over. Note that it doesn't purge anymore, but we don't care. | ||
async def purge_room(room_id: str, force: bool) -> None: | ||
await deferLater(self.hs.get_reactor(), 100, lambda: None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these deferLater
changes have been split off to #19058 so we could merge that first in order to clean up the diff here.
They are changed here because we started seeing these failures in CI with this PR for some reason:
Example failure: https://github.com/element-hq/synapse/actions/runs/18477454390/job/52645183606?pr=19057
builtins.AssertionError: Expected `looping_call` callback from the reactor to start with the sentinel logcontext but saw task-_resumable_task-0-IBzAmHUoepQfLnEA. In other words, another task shouldn't have leaked their logcontext to us.
@@ -1 +1 @@ | |||
Move `start_doing_background_updates()` to `SynapseHomeServer.start_background_tasks()`. | |||
Move `start_doing_background_updates()` to `HomeServer.start_background_tasks()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PresenceHandlerInitTestCase
failures ❌
PresenceHandlerInitTestCase
purposely tries to disable background tasks to keep PresenceHandler
uninitialized until the test is prepared. It tries to test that "presence state restored from the database [when PresenceHandler
is initialized] should not persist forever".
This conflicts with the changes currently being made where we expect the background tasks to run because we want the database background tasks to complete before we start the tests.
synapse/tests/handlers/test_presence.py
Lines 811 to 817 in ec7554b
class PresenceHandlerInitTestCase(unittest.HomeserverTestCase): | |
def default_config(self) -> JsonDict: | |
config = super().default_config() | |
# Disable background tasks on this worker so that the PresenceHandler isn't | |
# loaded until we request it. | |
config["run_background_tasks_on"] = "other" | |
return config |
Example failures:
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_presence.PresenceHandlerInitTestCase
The "poetry.dev-dependencies" section is deprecated and will be removed in a future version. Use "poetry.group.dev.dependencies" instead.
tests.handlers.test_presence
PresenceHandlerInitTestCase
test_restored_presence_idles ... [ERROR]
test_restored_presence_online_after_sync_0_org_matrix_msc3026_busy ... [ERROR]
test_restored_presence_online_after_sync_1_online ... [ERROR]
test_restored_presence_online_after_sync_2_unavailable ... [ERROR]
test_restored_presence_online_after_sync_3_offline ... [ERROR]
===============================================================================
[ERROR]
Traceback (most recent call last):
File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 138, in new
return code(orig, *args, **kwargs)
File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 232, in setUp
return orig()
File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 404, in setUp
self.hs = self.make_homeserver(self.reactor, self.clock)
File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 525, in make_homeserver
hs = self.setup_test_homeserver(reactor=reactor, clock=clock)
File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 733, in setup_test_homeserver
raise AssertionError(
builtins.AssertionError: Timed out waiting for background updates to complete (5s). Did you forget to `start_doing_background_updates()`?
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_idles
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_0_org_matrix_msc3026_busy
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_1_online
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_2_unavailable
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_3_offline
-------------------------------------------------------------------------------
Ran 5 tests in 25.208s
FAILED (errors=5)
@@ -1 +1 @@ | |||
Move `start_doing_background_updates()` to `SynapseHomeServer.start_background_tasks()`. | |||
Move `start_doing_background_updates()` to `HomeServer.start_background_tasks()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Database isn't setup for tests that use GenericWorkerServer
❌
We have a few tests that use homeserver_to_use=GenericWorkerServer
and are failing with builtins.AssertionError: Timed out waiting for background updates to complete (5s). Did you forget to 'start_doing_background_updates()'?
in CI (from our new assertion).
Error:
Traceback (most recent call last):
File "/home/runner/work/synapse/synapse/tests/unittest.py", line 138, in new
return code(orig, *args, **kwargs)
File "/home/runner/work/synapse/synapse/tests/unittest.py", line 232, in setUp
return orig()
File "/home/runner/work/synapse/synapse/tests/unittest.py", line 404, in setUp
self.hs = self.make_homeserver(self.reactor, self.clock)
File "/home/runner/work/synapse/synapse/tests/app/test_openid_listener.py", line 40, in make_homeserver
hs = self.setup_test_homeserver(homeserver_to_use=GenericWorkerServer)
File "/home/runner/work/synapse/synapse/tests/unittest.py", line 733, in setup_test_homeserver
raise AssertionError(
builtins.AssertionError: Timed out waiting for background updates to complete (5s). Did you forget to `start_doing_background_updates()`?
tests.app.test_openid_listener.FederationReaderOpenIDListenerTests.test_openid_listener_0
tests.app.test_openid_listener.FederationReaderOpenIDListenerTests.test_openid_listener_1
tests.app.test_openid_listener.FederationReaderOpenIDListenerTests.test_openid_listener_2
tests.app.test_openid_listener.FederationReaderOpenIDListenerTests.test_openid_listener_3
...
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_idles
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_0_org_matrix_msc3026_busy
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_1_online
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_2_unavailable
tests.handlers.test_presence.PresenceHandlerInitTestCase.test_restored_presence_online_after_sync_3_offline
...
tests.replication.test_federation_ack.FederationAckTestCase.test_federation_ack_sent
...
tests.storage.test_rollback_worker.WorkerSchemaTests.test_not_upgraded_current_schema_version_with_outstanding_deltas
tests.storage.test_rollback_worker.WorkerSchemaTests.test_not_upgraded_old_schema_version
tests.storage.test_rollback_worker.WorkerSchemaTests.test_rolling_back
Previously, we didn't wait for background updates in these cases and only ran the database background updates for the TestHomeServer
(the default in tests). Basically turning a blind eye to this kind of problem.
Lines 680 to 696 in ec7554b
async def run_bg_updates() -> None: | |
with LoggingContext(name="run_bg_updates", server_name=server_name): | |
self.get_success(stor.db_pool.updates.run_background_updates(False)) | |
hs = setup_test_homeserver( | |
cleanup_func=self.addCleanup, | |
server_name=server_name, | |
config=config_obj, | |
reactor=reactor, | |
clock=clock, | |
**extra_homeserver_attributes, | |
) | |
stor = hs.get_datastores().main | |
# Run the database background updates, when running against "master". | |
if hs.__class__.__name__ == "TestHomeServer": | |
self.get_success(run_bg_updates()) |
Unable to run test by itself
Compounding things is that we have to deal with #15671 (previously matrix-org/synapse#15671) which prevents us from running these tests on their own so it's even harder to debug what's going on.
synapse.storage.prepare_database.UpgradeDatabaseException: Uninitialised database: run the main synapse process to prepare the database schema before starting worker processes.
….start_background_tasks()` (#19036)" (#19059) ### Why See #19036 (comment) Revert while I figure out the tests in #19057
Move
start_doing_background_updates()
toHomeServer.start_background_tasks()
Follow-up to #19036 (reverted in #19059) where I incorrectly had this in
SynapseHomeServer.start_background_tasks()
. This was flawed because ifrun_background_tasks_on
is configured to something other thanmain
, we will never callSynapseHomeServer.start_background_tasks()
specifically (SynapseHomeServer
is for the main Synapse instance), and instead only callHomeServer.start_background_tasks()
(HomeServer
is for main and worker Synapse instances).synapse/synapse/app/_base.py
Lines 682 to 683 in ec7554b
Our goal is to run
start_doing_background_updates()
on whatever homeserver is configured torun_background_tasks_on
.Dev notes
From #18886
homeserver_to_use=GenericWorkerServer
-> #15671 (matrix-org/synapse#15671)
run_background_tasks_on
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.