Skip to content
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
6170762
Add logcontext to looping calls
MadLittleMods Sep 10, 2025
5d2e4c4
Remove `server_name` arg
MadLittleMods Sep 10, 2025
414b178
No need for a description
MadLittleMods Sep 10, 2025
788cd19
Fix db loop
MadLittleMods Sep 10, 2025
77b3228
Revert keyword arg changes
MadLittleMods Sep 10, 2025
91f7bb3
Multiple paragraphs
MadLittleMods Sep 10, 2025
2e88eb3
Update docstring
MadLittleMods Sep 10, 2025
fbf5946
Apply the same treatment to `call_later`
MadLittleMods Sep 10, 2025
3809f3f
Add comments for why we `PreserveLoggingContext()`
MadLittleMods Sep 10, 2025
442dbc0
Add changelog
MadLittleMods Sep 10, 2025
13b938f
Sanity check that we start in the sentinel logcontext
MadLittleMods Sep 10, 2025
04825eb
Improve `Clock.sleep` test
MadLittleMods Sep 10, 2025
f4ad07d
Fix copy/paste typo
MadLittleMods Sep 10, 2025
a8e66e7
Fixup `looping_call`
MadLittleMods Sep 11, 2025
0780183
All logcontext tests should use `@logcontext_clean`
MadLittleMods Sep 11, 2025
c535d8a
Add `test_looping_call`
MadLittleMods Sep 11, 2025
aec7065
Add `test_call_later` and align `call_later` with `looping_call`
MadLittleMods Sep 11, 2025
2aa15b0
Align tests
MadLittleMods Sep 11, 2025
1c001c9
No need to return something
MadLittleMods Sep 11, 2025
13a5a36
Revert "No need to return something"
MadLittleMods Sep 11, 2025
eae83e8
Be more specific on what to return
MadLittleMods Sep 11, 2025
3797515
Fix typo in assertion message
MadLittleMods Sep 11, 2025
9116e74
Fix logcontext leak in email pusher test
MadLittleMods Sep 11, 2025
57cc675
Clarify why
MadLittleMods Sep 12, 2025
45e6b78
Use "foo" instead of "one"
MadLittleMods Sep 12, 2025
4145160
Wrap callback to see underlying failure
MadLittleMods Sep 12, 2025
f2ae33a
Merge branch 'develop' into madlittlemods/looping-call-logcontext
MadLittleMods Sep 17, 2025
160eb63
Merge branch 'develop' into madlittlemods/looping-call-logcontext
MadLittleMods Sep 22, 2025
c8d0f97
Cancelling the deferred doesn't matter AFACT
MadLittleMods Sep 22, 2025
774d25f
Merge branch 'develop' into madlittlemods/looping-call-logcontext
MadLittleMods Sep 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/18907.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Remove `sentinel` logcontext usage in `Clock` utilities like `looping_call` and `call_later`.
87 changes: 74 additions & 13 deletions synapse/util/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
from typing_extensions import ParamSpec

from twisted.internet import defer, task
from twisted.internet.defer import Deferred
from twisted.internet.interfaces import IDelayedCall, IReactorTime
from twisted.internet.task import LoopingCall
from twisted.python.failure import Failure
Expand Down Expand Up @@ -121,6 +122,8 @@ class Clock:

async def sleep(self, seconds: float) -> None:
d: defer.Deferred[float] = defer.Deferred()
# Start task in the `sentinel` logcontext, to avoid leaking the current context
# into the reactor once it finishes.
with context.PreserveLoggingContext():
self._reactor.callLater(seconds, d.callback, seconds)
await d
Expand Down Expand Up @@ -149,8 +152,9 @@ def looping_call(
this functionality thanks to this function being a thin wrapper around
`twisted.internet.task.LoopingCall`.

Note that the function will be called with no logcontext, so if it is anything
other than trivial, you probably want to wrap it in run_as_background_process.
Note that the function will be called with generic `looping_call` logcontext, so
if it is anything other than a trivial task, you probably want to wrap it in
`run_as_background_process` to give it more specific label and track metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not your problem to deal with, but I've often wondered why we don't just make looping_call take a name and do this wrapping itself. I can't think of a generic looping_call ctx ever being the desirable outcome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible future 👍


Args:
f: The function to call repeatedly.
Expand All @@ -172,8 +176,9 @@ def looping_call_now(
As with `looping_call`: subsequent calls are not scheduled until after the
the Awaitable returned by a previous call has finished.

Also as with `looping_call`: the function is called with no logcontext and
you probably want to wrap it in `run_as_background_process`.
Note that the function will be called with generic `looping_call` logcontext, so
if it is anything other than a trivial task, you probably want to wrap it in
`run_as_background_process` to give it more specific label and track metrics.

Args:
f: The function to call repeatedly.
Expand All @@ -192,9 +197,43 @@ def _looping_call_common(
**kwargs: P.kwargs,
) -> LoopingCall:
"""Common functionality for `looping_call` and `looping_call_now`"""
call = task.LoopingCall(f, *args, **kwargs)

def wrapped_f(*args: P.args, **kwargs: P.kwargs) -> Deferred:
assert context.current_context() is context.SENTINEL_CONTEXT, (
"Expected `looping_call` callback from the reactor to start with the sentinel logcontext "
f"but saw {context.current_context()}. In other words, another task shouldn't have "
"leaked their logcontext to us."
)

# Because this is a callback from the reactor, we will be using the
# `sentinel` log context at this point. We want the function to log with
# some logcontext as we want to know which server the logs came from.
#
# We use `PreserveLoggingContext` to prevent our new `looping_call`
# logcontext from finishing as soon as we exit this function, in case `f`
# returns an awaitable/deferred which would continue running and may try to
# restore the `loop_call` context when it's done (because it's trying to
# adhere to the Synapse logcontext rules.)
#
# This also ensures that we return to the `sentinel` context when we exit
# this function and yield control back to the reactor to avoid leaking the
# current logcontext to the reactor (which would then get picked up and
# associated with the next thing the reactor does)
with context.PreserveLoggingContext(context.LoggingContext("looping_call")):
# We use `run_in_background` to reset the logcontext after `f` (or the
# awaitable returned by `f`) completes
return context.run_in_background(f, *args, **kwargs)

call = task.LoopingCall(wrapped_f, *args, **kwargs)
call.clock = self._reactor
d = call.start(msec / 1000.0, now=now)
# If `now=true`, the function will be called here immediately so we need to be
# in the sentinel context now.
#
# We want to start the task in the `sentinel` logcontext, to avoid leaking the
# current context into the reactor after the function finishes. TODO: Or perhaps
# someone cancels the looping call (does this matter?).
with context.PreserveLoggingContext():
d = call.start(msec / 1000.0, now=now)
d.addErrback(log_failure, "Looping call died", consumeErrors=False)
return call

Expand All @@ -203,8 +242,9 @@ def call_later(
) -> IDelayedCall:
"""Call something later

Note that the function will be called with no logcontext, so if it is anything
other than trivial, you probably want to wrap it in run_as_background_process.
Note that the function will be called with generic `call_later` logcontext, so
if it is anything other than a trivial task, you probably want to wrap it in
`run_as_background_process` to give it more specific label and track metrics.

Args:
delay: How long to wait in seconds.
Expand All @@ -214,11 +254,32 @@ def call_later(
"""

def wrapped_callback(*args: Any, **kwargs: Any) -> None:
with context.PreserveLoggingContext():
callback(*args, **kwargs)

with context.PreserveLoggingContext():
Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PreserveLoggingContext() doesn't seem necessary but please double-check. We aren't calling anything at this point. Only scheduling something for the reactor to run later

Unless we think that a reactor implementation might call immediately when delay is 0 (reactor.callLater(0, func)). Although, I'm assuming this has the same semantics as setTimeout(code, 0) in JavaScript where it should be run as soon as possible but not right away (put it on the queue).

Please cross-check with whether the test_call_later is good enough to stress this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fair to me. I can't rule out some weird non-obvious interaction but if that was the case, there should've been a comment to that extent.

return self._reactor.callLater(delay, wrapped_callback, *args, **kwargs)
assert context.current_context() is context.SENTINEL_CONTEXT, (
"Expected `call_later` callback from the reactor to start with the sentinel logcontext "
f"but saw {context.current_context()}. In other words, another task shouldn't have "
"leaked their logcontext to us."
)

# Because this is a callback from the reactor, we will be using the
# `sentinel` log context at this point. We want the function to log with
# some logcontext as we want to know which server the logs came from.
#
# We use `PreserveLoggingContext` to prevent our new `call_later`
# logcontext from finishing as soon as we exit this function, in case `f`
# returns an awaitable/deferred which would continue running and may try to
# restore the `loop_call` context when it's done (because it's trying to
# adhere to the Synapse logcontext rules.)
#
# This also ensures that we return to the `sentinel` context when we exit
# this function and yield control back to the reactor to avoid leaking the
# current logcontext to the reactor (which would then get picked up and
# associated with the next thing the reactor does)
with context.PreserveLoggingContext(context.LoggingContext("call_later")):
# We use `run_in_background` to reset the logcontext after `f` (or the
# awaitable returned by `f`) completes
context.run_in_background(callback, *args, **kwargs)

return self._reactor.callLater(delay, wrapped_callback, *args, **kwargs)

def cancel_call_later(self, timer: IDelayedCall, ignore_errs: bool = False) -> None:
try:
Expand Down
3 changes: 2 additions & 1 deletion tests/push/test_email.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@

import synapse.rest.admin
from synapse.api.errors import Codes, SynapseError
from synapse.logging.context import make_deferred_yieldable
from synapse.push.emailpusher import EmailPusher
from synapse.rest.client import login, room
from synapse.rest.synapse.client.unsubscribe import UnsubscribeResource
Expand Down Expand Up @@ -88,7 +89,7 @@ def sendmail(*args: Any, **kwargs: Any) -> Deferred:
# This mocks out synapse.reactor.send_email._sendmail.
d: Deferred = Deferred()
self.email_attempts.append((d, args, kwargs))
return d
return make_deferred_yieldable(d)
Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were leaking the logcontext into the reactor

This took some debug diving to find the culprit but turns out we already had the same fix in the HTTP pusher tests since a267c2e (matrix-org/synapse#4204)


Previously, this was causing some tests to fail:

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.push.test_email.EmailPusherTests.test_invite_to_empty_room_sends_email

_trial_temp/test.log

2025-09-11 01:01:23-0500 [-] synapse.util - 308 - ERROR - emailpush.process-2 - Looping call died
	Traceback (most recent call last):
	  File "/virtualenvs/matrix-synapse-xCtC9ulO-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 216, in maybeDeferred
	    result = f(*args, **kwargs)
	  File "synapse/synapse/util/__init__.py", line 202, in wrapped_f
	    assert context.current_context() is context.SENTINEL_CONTEXT, (
	           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	AssertionError: Expected `looping_call` callback from the reactor to start with the sentinel logcontext but saw emailpush.process-2. In other words, another task shouldn't have leaked their logcontext to us.

We caught this because we added that new sentinel logcontext assertion (discussed in another thread above)


hs.get_send_email_handler()._sendmail = sendmail # type: ignore[assignment]

Expand Down
Loading
Loading