Fix `run_coroutine_in_background(...)` incorrectly handling logcontext #18964

MadLittleMods · 2025-09-23T19:45:09Z

Fix run_coroutine_in_background(...) incorrectly handling logcontext.

Regressed in #18900 (comment) (see conversation there for more context)

How is this a regression?

To give this an update with more hindsight; this logic was redundant with the early return and it is safe to remove this complexity ✅

It seems like this actually has to do with completed vs incomplete deferreds...

To explain how things previously worked without the early-return shortcut:

With the normal case of incomplete awaitable, we store the calling_context and the f function is called and runs until it yields to the reactor. Because f follows the logcontext rules, it sets the sentinel logcontext. Then in run_in_background(...), we restore the calling_context, store the current ctx (which is sentinel) and return. When the deferred completes, we restore ctx (which is sentinel) before yielding to the reactor again (all good ✅)

With the other case where we see a completed awaitable, we store the calling_context and the f function is called and runs to completion (no logcontext change). This is where the shortcut would kick in but I'm going to continue explaining as if we commented out the shortcut. -- Then in run_in_background(...), we restore the calling_context, store the current ctx (which is same as the calling_context). Because the deferred is already completed, our extra callback is called immediately and we restore ctx (which is same as the calling_context). Since we never yield to the reactor, the calling_context is perfect as that's what we want again (all good ✅)

But this also means that our early-return shortcut is no longer just an optimization and is necessary to act correctly in the completed awaitable case as we want to return with the calling_context and not reset to the sentinel context. I've updated the comment in #18964 to explain the necessity as it's currently just described as an optimization.

But because we made the same change to run_coroutine_in_background(...) which didn't have the same early-return shortcut, we regressed the correct behavior ❌ . This is being fixed in #18964

-- @MadLittleMods, #18900 (comment)

How did we find this problem?

Spawning from @wrjlewis seeing Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost in the logs:

More logs

synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context.
2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost
2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response.
2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis
connected.
2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/

From the logs there, we can see things relating to typing.get_new_events and /_matrix/client/unstable/org.matrix.simplified_msc3575/sync which led me to trying out Sliding Sync with the typing extension enabled and allowed me to reproduce the problem locally. Sliding Sync is a unique scenario as it's the only place we use gather_optional_coroutines(...) -> run_coroutine_in_background(...) (introduced in #17884) to exhibit this behavior.

Testing strategy

Configure Synapse to enable MSC4186: Simplified Sliding Sync which is actually under MSC3575
```
experimental_features:
  msc3575_enabled: true
```
Start synapse: poetry run synapse_homeserver --config-path homeserver.yaml

Make a Sliding Sync request with one of the extensions enabled

POST http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync
{
  "lists": {},
  "room_subscriptions": {
        "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": {
            "required_state": [],
            "timeline_limit": 1
        }
    },
    "extensions": {
        "typing": {
            "enabled": true
        }
    }
}

Open your homeserver logs and notice warnings about Starting ... from sentinel context: metrics will be lost

Dev notes

SYNAPSE_TEST_LOG_LEVEL=DEBUG poetry run trial tests.util.test_logcontext.LoggingContextTestCase

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file. The entry should:
- Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
- Use markdown where necessary, mostly for code blocks.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
Code style is correct (run the linters)

This way we don't have to share tricks and ensure both stay in sync with the new meta.

MadLittleMods · 2025-09-23T19:56:03Z

synapse/logging/context.py

-    do not run until called, and so calling an async function without awaiting
-    cannot change the log contexts.


Removed this last sentence because this isn't true on multiple levels:

Calling an async function will run immediately until it yields (hits an await for an incomplete awaitable)

Calling an async function can change the logcontext (and happens all the time). This is exactly why we set the logcontext back to the calling_context before returning.

MadLittleMods · 2025-09-23T19:58:48Z

synapse/logging/context.py

-    # Wrap the coroutine in a deferred, which will have the side effect of executing the
-    # coroutine in the background.
-    d = defer.ensureDeferred(coroutine)
-


To fix the root problem, all we need to do is add this same fix that run_in_background already has for already completed deferreds:

synapse/synapse/logging/context.py

Lines 858 to 862 in 9680804

# The deferred has already completed

if d.called and not d.paused:

# The function should have maintained the logcontext, so we can

# optimise out the messing about

return d

But instead of duplicating all of this specialty logic and context into run_coroutine_in_background(...), we can just simplify to using run_in_background(...). Especially when run_coroutine_in_background(...) is just an ergonomic wrapper around run_in_background(...)

See #18900 (comment) for more information on how this shortcut and the logcontext logic works for run_in_background(...)

Related conversation where I asked why we even have run_coroutine_in_background(...) -> #18900 (comment)

tests/util/test_logcontext.py

synapse/logging/context.py

See #18964 (comment)

…ound-logcontext

MadLittleMods · 2025-09-23T22:51:12Z

synapse/logging/context.py

+        # The function should have maintained the calling logcontext, so we can avoid
+        # messing with it further. Additionally, if the deferred has already completed,
+        # then it would be a mistake to then add a deferred callback (below) to reset
+        # the logcontext to the sentinel logcontext as that would run immediately
+        # (remember our goal is to maintain the calling logcontext when we return).


This is no longer just an optimization since #18900 (comment). It turns out the previous behavior was there for a reason but was redundant with this shortcut. I've updated the comment here to reflect that.

And the problem was that we just didn't also have this shortcut in run_coroutine_in_background(...) so we regressed things.

We've now consolidated things to re-use all of the run_in_background(...) logic

…ound-logcontext

MadLittleMods · 2025-09-23T23:32:26Z

tests/util/test_logcontext.py


-    def _test_run_in_background(self, function: Callable[[], object]) -> defer.Deferred:
-        sentinel_context = current_context()
+    async def _test_run_in_background(self, function: Callable[[], object]) -> None:


The run_in_background test changes should be equivalent to before. This is just a small refactor to clean up these tests (use async/await) to make them more straight-forward.

MadLittleMods · 2025-09-23T23:34:26Z

tests/util/test_logcontext.py

+        await self._test_run_in_background(testfunc)
+
+    @logcontext_clean
+    async def test_run_coroutine_in_background(self) -> None:


New run_coroutine_in_background tests are here.

We could use a similar pattern to run_in_background(...) where we have a test helper that shares a bunch of the logic. Given we only have to test two variants with coroutines, I've opted not to dry this out.

MadLittleMods · 2025-09-23T23:36:11Z

tests/util/test_logcontext.py

+        self._check_test_key("sentinel")
+
+    @logcontext_clean
+    async def test_run_coroutine_in_background_with_nonblocking_coroutine(self) -> None:


I'm not in love with the "nonblocking" terminology used here but I've aligned with the prior art here (test_run_in_background_with_nonblocking_coroutine) and expanded on what that means in the docstring ⏩

MadLittleMods · 2025-09-23T23:38:25Z

changelog.d/18964.misc

@@ -0,0 +1 @@
+Fix `run_coroutine_in_background(...)` incorrectly handling logcontext.


Given this technically regressed in #18900 which is part of 1.139.0rc1, we should land another RC with this PR

(The PR description here at the top explains the regression)

cc @anoadragon453

MadLittleMods · 2025-09-23T23:56:44Z

synapse/logging/context.py

+     - When `run_in_background` is called, the calling logcontext is stored
+       ("original"), we kick off the background task in the current context, and we
+       restore that original context before returning.
+     - For a completed deferred, that's the end of the story.
+     - For an incomplete deferred, when the background task finishes, we don't want to
+       leak our context into the reactor which would erroneously get attached to the
+       next operation picked up by the event loop. We add a callback to the deferred
+       which will clear the logging context after it finishes and yields control back to
+       the reactor.


Based on the explanation laid out in #18900 (comment)

anoadragon453 · 2025-09-24T11:00:32Z

synapse/logging/context.py

+        # (remember our goal is to maintain the calling logcontext when we return).
        return d

    # The function may have reset the context before returning, so we need to restore it


If we're trusting that the function maintained the calling logcontext just above, why here do we not trust that the function maintained the calling logcontext?

This plays into the Synapse logcontext rules:

synapse/docs/log_contexts.md

Lines 114 to 120 in 7708801

**Rules for functions returning awaitables:**

> - If the awaitable is already complete, the function returns with the

> same logcontext it started with.

> - If the awaitable is incomplete, the function clears the logcontext

> before returning; when the awaitable completes, it restores the

> logcontext before running any callbacks.

I've updated the comments with some further details:

synapse/synapse/logging/context.py

Lines 860 to 899 in ffbf188

# The deferred has already completed

if d.called and not d.paused:

# If the function messes with logcontexts, we can assume it follows the Synapse

# logcontext rules (Rules for functions returning awaitables: "If the awaitable

# is already complete, the function returns with the same logcontext it started

# with."). If it function doesn't touch logcontexts at all, we can also assume

# the logcontext is unchanged.

#

# Either way, the function should have maintained the calling logcontext, so we

# can avoid messing with it further. Additionally, if the deferred has already

# completed, then it would be a mistake to then add a deferred callback (below)

# to reset the logcontext to the sentinel logcontext as that would run

# immediately (remember our goal is to maintain the calling logcontext when we

# return).

return d

# Since the function we called may follow the Synapse logcontext rules (Rules for

# functions returning awaitables: "If the awaitable is incomplete, the function

# clears the logcontext before returning"), the function may have reset the

# logcontext before returning, so we need to restore the calling logcontext now

# before we return ourselves.

#

# Our goal is to have the caller logcontext unchanged after firing off the

# background task and returning.

set_current_context(calling_context)

# If the function we called is playing nice and following the Synapse logcontext

# rules, it will restore original calling logcontext when the deferred completes;

# but there is nothing waiting for it, so it will get leaked into the reactor (which

# would then get picked up by the next thing the reactor does). We therefore need to

# reset the logcontext here (set the `sentinel` logcontext) before yielding control

# back to the reactor.

#

# (If this feels asymmetric, consider it this way: we are

# effectively forking a new thread of execution. We are

# probably currently within a ``with LoggingContext()`` block,

# which is supposed to have a single entry and exit point. But

# by spawning off another deferred, we are effectively

# adding a new exit point.)

d.addBoth(_set_context_cb, SENTINEL_CONTEXT)

Let me know if that makes more sense otherwise we can continue iterating in another PR

I see, so there's a difference in the potential current logcontext dependent on whether the awaitable is complete or incomplete. That makes sense why we don't need to bother when it has completed.

Thanks for the clarification and the extra comments!

See #18964 (comment)

…ound-logcontext

MadLittleMods · 2025-09-24T16:10:40Z

Thanks for the review @reivilibre and @anoadragon453 🦢

@wrjlewis also confirmed the problem was solved with these changes ✅ Thanks for the original report and trying things out early 🐃

@MadLittleMods

#18964) Regressed in #18900 (comment) (see conversation there for more context) ### How is this a regression? > To give this an update with more hindsight; this logic *was* redundant with the early return and it is safe to remove this complexity :white_check_mark: > > It seems like this actually has to do with completed vs incomplete deferreds... > > To explain how things previously worked *without* the early-return shortcut: > > With the normal case of **incomplete awaitable**, we store the `calling_context` and the `f` function is called and runs until it yields to the reactor. Because `f` follows the logcontext rules, it sets the `sentinel` logcontext. Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is `sentinel`) and return. When the deferred completes, we restore `ctx` (which is `sentinel`) before yielding to the reactor again (all good :white_check_mark:) > > With the other case where we see a **completed awaitable**, we store the `calling_context` and the `f` function is called and runs to completion (no logcontext change). *This is where the shortcut would kick in but I'm going to continue explaining as if we commented out the shortcut.* -- Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is same as the `calling_context`). Because the deferred is already completed, our extra callback is called immediately and we restore `ctx` (which is same as the `calling_context`). Since we never yield to the reactor, the `calling_context` is perfect as that's what we want again (all good :white_check_mark:) > > --- > > But this also means that our early-return shortcut is no longer just an optimization and is *necessary* to act correctly in the **completed awaitable** case as we want to return with the `calling_context` and not reset to the `sentinel` context. I've updated the comment in #18964 to explain the necessity as it's currently just described as an optimization. > > But because we made the same change to `run_coroutine_in_background(...)` which didn't have the same early-return shortcut, we regressed the correct behavior ❌ . This is being fixed in #18964 > > > *-- @MadLittleMods, #18900 (comment) ### How did we find this problem? Spawning from @wrjlewis [seeing](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$h3TxxPVlqC6BTL07dbrsz6PmaUoZxLiXnSTEY-QYDtA?via=jki.re&via=matrix.org&via=element.io) `Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost` in the logs: <details> <summary>More logs</summary> ``` synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context. 2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost 2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response. 2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis connected. 2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/ ``` </details> From the logs there, we can see things relating to `typing.get_new_events` and `/_matrix/client/unstable/org.matrix.simplified_msc3575/sync` which led me to trying out Sliding Sync with the typing extension enabled and allowed me to reproduce the problem locally. Sliding Sync is a unique scenario as it's the only place we use `gather_optional_coroutines(...)` -> `run_coroutine_in_background(...)` (introduced in #17884) to exhibit this behavior. ### Testing strategy 1. Configure Synapse to enable [MSC4186](matrix-org/matrix-spec-proposals#4186): Simplified Sliding Sync which is actually under [MSC3575](matrix-org/matrix-spec-proposals#3575) ```yaml experimental_features: msc3575_enabled: true ``` 1. Start synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Make a Sliding Sync request with one of the extensions enabled ```http POST http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync { "lists": {}, "room_subscriptions": { "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": { "required_state": [], "timeline_limit": 1 } }, "extensions": { "typing": { "enabled": true } } } ``` 1. Open your homeserver logs and notice warnings about `Starting ... from sentinel context: metrics will be lost`

@MadLittleMods

…t (#18964) Regressed in element-hq/synapse#18900 (comment) (see conversation there for more context) ### How is this a regression? > To give this an update with more hindsight; this logic *was* redundant with the early return and it is safe to remove this complexity :white_check_mark: > > It seems like this actually has to do with completed vs incomplete deferreds... > > To explain how things previously worked *without* the early-return shortcut: > > With the normal case of **incomplete awaitable**, we store the `calling_context` and the `f` function is called and runs until it yields to the reactor. Because `f` follows the logcontext rules, it sets the `sentinel` logcontext. Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is `sentinel`) and return. When the deferred completes, we restore `ctx` (which is `sentinel`) before yielding to the reactor again (all good :white_check_mark:) > > With the other case where we see a **completed awaitable**, we store the `calling_context` and the `f` function is called and runs to completion (no logcontext change). *This is where the shortcut would kick in but I'm going to continue explaining as if we commented out the shortcut.* -- Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is same as the `calling_context`). Because the deferred is already completed, our extra callback is called immediately and we restore `ctx` (which is same as the `calling_context`). Since we never yield to the reactor, the `calling_context` is perfect as that's what we want again (all good :white_check_mark:) > > --- > > But this also means that our early-return shortcut is no longer just an optimization and is *necessary* to act correctly in the **completed awaitable** case as we want to return with the `calling_context` and not reset to the `sentinel` context. I've updated the comment in element-hq/synapse#18964 to explain the necessity as it's currently just described as an optimization. > > But because we made the same change to `run_coroutine_in_background(...)` which didn't have the same early-return shortcut, we regressed the correct behavior ❌ . This is being fixed in element-hq/synapse#18964 > > > *-- @MadLittleMods, element-hq/synapse#18900 (comment) ### How did we find this problem? Spawning from @wrjlewis [seeing](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$h3TxxPVlqC6BTL07dbrsz6PmaUoZxLiXnSTEY-QYDtA?via=jki.re&via=matrix.org&via=element.io) `Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost` in the logs: <details> <summary>More logs</summary> ``` synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context. 2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost 2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response. 2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis connected. 2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/ ``` </details> From the logs there, we can see things relating to `typing.get_new_events` and `/_matrix/client/unstable/org.matrix.simplified_msc3575/sync` which led me to trying out Sliding Sync with the typing extension enabled and allowed me to reproduce the problem locally. Sliding Sync is a unique scenario as it's the only place we use `gather_optional_coroutines(...)` -> `run_coroutine_in_background(...)` (introduced in element-hq/synapse#17884) to exhibit this behavior. ### Testing strategy 1. Configure Synapse to enable [MSC4186](matrix-org/matrix-spec-proposals#4186): Simplified Sliding Sync which is actually under [MSC3575](matrix-org/matrix-spec-proposals#3575) ```yaml experimental_features: msc3575_enabled: true ``` 1. Start synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Make a Sliding Sync request with one of the extensions enabled ```http POST http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync { "lists": {}, "room_subscriptions": { "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": { "required_state": [], "timeline_limit": 1 } }, "extensions": { "typing": { "enabled": true } } } ``` 1. Open your homeserver logs and notice warnings about `Starting ... from sentinel context: metrics will be lost`

@frastefanini

Deployments that make use of the [synapse-s3-storage-provider](https://github.com/matrix-org/synapse-s3-storage-provider) module must upgrade to [v1.6.0](https://github.com/matrix-org/synapse-s3-storage-provider/releases/tag/v1.6.0). Using older versions of the module with this release of Synapse will prevent users from being able to upload or download media. No significant changes since 1.140.0rc1. - Add [a new Media Query by ID Admin API](https://element-hq.github.io/synapse/v1.140/admin_api/media_admin_api.html#query-a-piece-of-media-by-id) that allows server admins to query and investigate the metadata of local or cached remote media via the `origin/media_id` identifier found in a [Matrix Content URI](https://spec.matrix.org/v1.14/client-server-api/#matrix-content-mxc-uris). ([\element-hq#18911](element-hq#18911)) - Add [a new Fetch Event Admin API](https://element-hq.github.io/synapse/v1.140/admin_api/fetch_event.html) to fetch an event by ID. ([\element-hq#18963](element-hq#18963)) - Update [MSC4284: Policy Servers](matrix-org/matrix-spec-proposals#4284) implementation to support signatures when available. ([\element-hq#18934](element-hq#18934)) - Add experimental implementation of the `GET /_matrix/client/v1/rtc/transports` endpoint for the latest draft of [MSC4143: MatrixRTC](matrix-org/matrix-spec-proposals#4143). ([\element-hq#18967](element-hq#18967)) - Expose a `defer_to_threadpool` function in the Synapse Module API that allows modules to run a function on a separate thread in a custom threadpool. ([\element-hq#19032](element-hq#19032)) - Fix room upgrade `room_config` argument and documentation for `user_may_create_room` spam-checker callback. ([\element-hq#18721](element-hq#18721)) - Compute a user's last seen timestamp from their devices' last seen timestamps instead of IPs, because the latter are automatically cleared according to `user_ips_max_age`. ([\element-hq#18948](element-hq#18948)) - Fix bug where ephemeral events were not filtered by room ID. Contributed by @frastefanini. ([\element-hq#19002](element-hq#19002)) - Update Synapse main process version string to include git info. ([\element-hq#19011](element-hq#19011)) - Explain how `Deferred` callbacks interact with logcontexts. ([\element-hq#18914](element-hq#18914)) - Fix documentation for `rc_room_creation` and `rc_reports` to clarify that a `per_user` rate limit is not supported. ([\element-hq#18998](element-hq#18998)) - Remove deprecated `LoggingContext.set_current_context`/`LoggingContext.current_context` methods which already have equivalent bare methods in `synapse.logging.context`. ([\element-hq#18989](element-hq#18989)) - Drop support for unstable field names from the long-accepted [MSC2732](matrix-org/matrix-spec-proposals#2732) (Olm fallback keys) proposal. ([\element-hq#18996](element-hq#18996)) - Cleanly shutdown `SynapseHomeServer` object, allowing artifacts of embedded small hosts to be properly garbage collected. ([\element-hq#18828](element-hq#18828)) - Update OEmbed providers to use 'X' instead of 'Twitter' in URL previews, following a rebrand. Contributed by @HammyHavoc. ([\element-hq#18767](element-hq#18767)) - Fix `server_name` in logging context for multiple Synapse instances in one process. ([\element-hq#18868](element-hq#18868)) - Wrap the Rust HTTP client with `make_deferred_yieldable` so it follows Synapse logcontext rules. ([\element-hq#18903](element-hq#18903)) - Fix the GitHub Actions workflow that moves issues labeled "X-Needs-Info" to the "Needs info" column on the team's internal triage board. ([\element-hq#18913](element-hq#18913)) - Disconnect background process work from request trace. ([\element-hq#18932](element-hq#18932)) - Reduce overall number of calls to `_get_e2e_cross_signing_signatures_for_devices` by increasing the batch size of devices the query is called with, reducing DB load. ([\element-hq#18939](element-hq#18939)) - Update error code used when an appservice tries to masquerade as an unknown device using [MSC4326](matrix-org/matrix-spec-proposals#4326). Contributed by @tulir @ Beeper. ([\element-hq#18947](element-hq#18947)) - Fix `no active span when trying to log` tracing error on startup (when OpenTracing is enabled). ([\element-hq#18959](element-hq#18959)) - Fix `run_coroutine_in_background(...)` incorrectly handling logcontext. ([\element-hq#18964](element-hq#18964)) - Add debug logs wherever we change current logcontext. ([\element-hq#18966](element-hq#18966)) - Update dockerfile metadata to fix broken link; point to documentation website. ([\element-hq#18971](element-hq#18971)) - Note that the code is additionally licensed under the [Element Commercial license](https://github.com/element-hq/synapse/blob/develop/LICENSE-COMMERCIAL) in SPDX expression field configs. ([\element-hq#18973](element-hq#18973)) - Fix logcontext handling in `timeout_deferred` tests. ([\element-hq#18974](element-hq#18974)) - Remove internal `ReplicationUploadKeysForUserRestServlet` as a follow-up to the work in element-hq#18581 that moved device changes off the main process. ([\element-hq#18988](element-hq#18988)) - Switch task scheduler from raw logcontext manipulation to using the dedicated logcontext utils. ([\element-hq#18990](element-hq#18990)) - Remove `MockClock()` in tests. ([\element-hq#18992](element-hq#18992)) - Switch back to our own custom `LogContextScopeManager` instead of OpenTracing's `ContextVarsScopeManager` which was causing problems when using the experimental `SYNAPSE_ASYNC_IO_REACTOR` option with tracing enabled. ([\element-hq#19007](element-hq#19007)) - Remove `version_string` argument from `HomeServer` since it's always the same. ([\element-hq#19012](element-hq#19012)) - Remove duplicate call to `hs.start_background_tasks()` introduced from a bad merge. ([\element-hq#19013](element-hq#19013)) - Split homeserver creation (`create_homeserver`) and setup (`setup`). ([\element-hq#19015](element-hq#19015)) - Swap near-end-of-life `macos-13` GitHub Actions runner for the `macos-15-intel` variant. ([\element-hq#19025](element-hq#19025)) - Introduce `RootConfig.validate_config()` which can be subclassed in `HomeServerConfig` to do cross-config class validation. ([\element-hq#19027](element-hq#19027)) - Allow any command of the `release.py` script to accept a `--gh-token` argument. ([\element-hq#19035](element-hq#19035)) * Bump Swatinem/rust-cache from 2.8.0 to 2.8.1. ([\element-hq#18949](element-hq#18949)) * Bump actions/cache from 4.2.4 to 4.3.0. ([\element-hq#18983](element-hq#18983)) * Bump anyhow from 1.0.99 to 1.0.100. ([\element-hq#18950](element-hq#18950)) * Bump authlib from 1.6.3 to 1.6.4. ([\element-hq#18957](element-hq#18957)) * Bump authlib from 1.6.4 to 1.6.5. ([\element-hq#19019](element-hq#19019)) * Bump bcrypt from 4.3.0 to 5.0.0. ([\element-hq#18984](element-hq#18984)) * Bump docker/login-action from 3.5.0 to 3.6.0. ([\element-hq#18978](element-hq#18978)) * Bump lxml from 6.0.0 to 6.0.2. ([\element-hq#18979](element-hq#18979)) * Bump phonenumbers from 9.0.13 to 9.0.14. ([\element-hq#18954](element-hq#18954)) * Bump phonenumbers from 9.0.14 to 9.0.15. ([\element-hq#18991](element-hq#18991)) * Bump prometheus-client from 0.22.1 to 0.23.1. ([\element-hq#19016](element-hq#19016)) * Bump pydantic from 2.11.9 to 2.11.10. ([\element-hq#19017](element-hq#19017)) * Bump pygithub from 2.7.0 to 2.8.1. ([\element-hq#18952](element-hq#18952)) * Bump regex from 1.11.2 to 1.11.3. ([\element-hq#18981](element-hq#18981)) * Bump serde from 1.0.224 to 1.0.226. ([\element-hq#18953](element-hq#18953)) * Bump serde from 1.0.226 to 1.0.228. ([\element-hq#18982](element-hq#18982)) * Bump setuptools-rust from 1.11.1 to 1.12.0. ([\element-hq#18980](element-hq#18980)) * Bump twine from 6.1.0 to 6.2.0. ([\element-hq#18985](element-hq#18985)) * Bump types-pyyaml from 6.0.12.20250809 to 6.0.12.20250915. ([\element-hq#19018](element-hq#19018)) * Bump types-requests from 2.32.4.20250809 to 2.32.4.20250913. ([\element-hq#18951](element-hq#18951)) * Bump typing-extensions from 4.14.1 to 4.15.0. ([\element-hq#18956](element-hq#18956))

MadLittleMods added 3 commits September 23, 2025 14:31

Add failing test

00a4076

Fix logcontext being messed up if the deferred was already complete

29b121e

Simplify run_coroutine_in_background to use run_in_background

c1f2431

This way we don't have to share tricks and ensure both stay in sync with the new meta.

MadLittleMods added the A-Logging label Sep 23, 2025

MadLittleMods mentioned this pull request Sep 23, 2025

Remove sentinel logcontext where we log in Synapse #18905

Closed

MadLittleMods commented Sep 23, 2025

View reviewed changes

No longer just an optimization

2feae9a

MadLittleMods commented Sep 23, 2025

View reviewed changes

tests/util/test_logcontext.py Outdated Show resolved Hide resolved

MadLittleMods commented Sep 23, 2025

View reviewed changes

synapse/logging/context.py Outdated Show resolved Hide resolved

MadLittleMods mentioned this pull request Sep 23, 2025

Better explain logcontext in run_in_background(...), run_as_background_process(...), and the sentinel logcontext #18900

Merged

4 tasks

MadLittleMods added 4 commits September 23, 2025 17:37

Uncomment @logcontext_clean

ba22b73

See #18964 (comment)

Better comments

7fcec49

Add changelog

ade9853

Merge branch 'develop' into madlittlemods/fix-run_coroutine_in_backgr…

de7d49d

…ound-logcontext

MadLittleMods commented Sep 23, 2025

View reviewed changes

MadLittleMods added 4 commits September 23, 2025 17:51

Merge branch 'develop' into madlittlemods/fix-run_coroutine_in_backgr…

b60cd31

…ound-logcontext

Refactor run_in_background tests to be a little more straight-forward

c6e8141

Better test descriptions

2ae72a6

Align names

caaa6a5

MadLittleMods commented Sep 23, 2025

View reviewed changes

Fix lints

c9d5cda

MadLittleMods commented Sep 23, 2025

View reviewed changes

MadLittleMods added the X-Release-Blocker label Sep 23, 2025

MadLittleMods commented Sep 23, 2025

View reviewed changes

MadLittleMods marked this pull request as ready for review September 23, 2025 23:58

MadLittleMods requested a review from a team as a code owner September 23, 2025 23:58

reivilibre approved these changes Sep 24, 2025

View reviewed changes

anoadragon453 reviewed Sep 24, 2025

View reviewed changes

MadLittleMods added 2 commits September 24, 2025 09:44

Clarify why we can assume the logcontext is unchanged

8e5935c

See #18964 (comment)

Merge branch 'develop' into madlittlemods/fix-run_coroutine_in_backgr…

ffbf188

…ound-logcontext

MadLittleMods enabled auto-merge (squash) September 24, 2025 15:15

MadLittleMods merged commit 0458f69 into develop Sep 24, 2025
76 of 78 checks passed

MadLittleMods deleted the madlittlemods/fix-run_coroutine_in_background-logcontext branch September 24, 2025 15:24

		do not run until called, and so calling an async function without awaiting
		cannot change the log contexts.

	# The deferred has already completed
	if d.called and not d.paused:
	# The function should have maintained the logcontext, so we can
	# optimise out the messing about
	return d

		@@ -0,0 +1 @@
		Fix `run_coroutine_in_background(...)` incorrectly handling logcontext.

	Rules for functions returning awaitables:

	> - If the awaitable is already complete, the function returns with the
	> same logcontext it started with.
	> - If the awaitable is incomplete, the function clears the logcontext
	> before returning; when the awaitable completes, it restores the
	> logcontext before running any callbacks.

	# The deferred has already completed
	if d.called and not d.paused:
	# If the function messes with logcontexts, we can assume it follows the Synapse
	# logcontext rules (Rules for functions returning awaitables: "If the awaitable
	# is already complete, the function returns with the same logcontext it started
	# with."). If it function doesn't touch logcontexts at all, we can also assume
	# the logcontext is unchanged.
	#
	# Either way, the function should have maintained the calling logcontext, so we
	# can avoid messing with it further. Additionally, if the deferred has already
	# completed, then it would be a mistake to then add a deferred callback (below)
	# to reset the logcontext to the sentinel logcontext as that would run
	# immediately (remember our goal is to maintain the calling logcontext when we
	# return).
	return d

	# Since the function we called may follow the Synapse logcontext rules (Rules for
	# functions returning awaitables: "If the awaitable is incomplete, the function
	# clears the logcontext before returning"), the function may have reset the
	# logcontext before returning, so we need to restore the calling logcontext now
	# before we return ourselves.
	#
	# Our goal is to have the caller logcontext unchanged after firing off the
	# background task and returning.
	set_current_context(calling_context)

	# If the function we called is playing nice and following the Synapse logcontext
	# rules, it will restore original calling logcontext when the deferred completes;
	# but there is nothing waiting for it, so it will get leaked into the reactor (which
	# would then get picked up by the next thing the reactor does). We therefore need to
	# reset the logcontext here (set the `sentinel` logcontext) before yielding control
	# back to the reactor.
	#
	# (If this feels asymmetric, consider it this way: we are
	# effectively forking a new thread of execution. We are
	# probably currently within a ``with LoggingContext()`` block,
	# which is supposed to have a single entry and exit point. But
	# by spawning off another deferred, we are effectively
	# adding a new exit point.)
	d.addBoth(_set_context_cb, SENTINEL_CONTEXT)

Fix run_coroutine_in_background(...) incorrectly handling logcontext #18964

Fix run_coroutine_in_background(...) incorrectly handling logcontext #18964

Uh oh!

Conversation

MadLittleMods commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How is this a regression?

How did we find this problem?

Testing strategy

Dev notes

Pull Request Checklist

Uh oh!

MadLittleMods Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MadLittleMods Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

anoadragon453 Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anoadragon453 Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MadLittleMods commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix `run_coroutine_in_background(...)` incorrectly handling logcontext #18964

Fix `run_coroutine_in_background(...)` incorrectly handling logcontext #18964

MadLittleMods commented Sep 23, 2025 •

edited

Loading

MadLittleMods Sep 23, 2025 •

edited

Loading

MadLittleMods Sep 23, 2025 •

edited

Loading

MadLittleMods Sep 23, 2025 •

edited

Loading

MadLittleMods Sep 24, 2025 •

edited

Loading