Skip to content

Conversation

@anoadragon453
Copy link
Member

@anoadragon453 anoadragon453 commented Oct 1, 2025

Attempt to replace manual usage of LoggingContext with the provided module API's run_in_background method.

I'm not entirely convinced about the changes to fetch (and subsequently s3_download_task). The fact we hand it a deferred directly is confusing me.

Spawning from #133. #74 can be closed after this PR is merged.

@anoadragon453 anoadragon453 requested a review from a team as a code owner October 1, 2025 16:02


def s3_download_task(s3_client, bucket, key, extra_args, deferred, parent_logcontext):
def s3_download_task(s3_client, bucket, key, extra_args, deferred):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only real changes here are:

  1. Remove parent_logcontext.
  2. Removing with LoggingContext ... and de-indenting all of the code underneath it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes make sense. s3_download_task will use whatever the caller logcontext is.

And I think we maintain the logcontext when calling s3_download_task ✅ (at-least with the suggested patterns).

Attempt to replace manual usage of LoggingContext with the
provided module API's `run_in_background` method.
@anoadragon453 anoadragon453 force-pushed the anoa/stop_using_logging_context_directly branch from 77b1bc5 to 5bdb5d9 Compare October 1, 2025 16:05


def s3_download_task(s3_client, bucket, key, extra_args, deferred, parent_logcontext):
def s3_download_task(s3_client, bucket, key, extra_args, deferred):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes make sense. s3_download_task will use whatever the caller logcontext is.

And I think we maintain the logcontext when calling s3_download_task ✅ (at-least with the suggested patterns).

We also make `store_file` and `fetch` async, as they are async in the
base class. This also simplifies the implementation.

We could go through and convert the whole module from deferreds to
async, but that should be done separately.
@anoadragon453
Copy link
Member Author

anoadragon453 commented Oct 2, 2025

190686c makes use of ModuleApi.defer_to_thread, which allows modules to easily run a function on a separate thread. However, you must use the default threadpool.

This module previously used its own threadpool (self._s3_pool), and even had an option to configure the size (threadpool_size, default 40). In previous issues, we've suggested users configure boto3's connection count to match the configured threadpool size: #117

The default threadpool size in Twisted is 10. I worry that switching to Synapse's default threadpool will hurt the performance of this module, with no way for users to configure it.

Perhaps we should additionally expose defer_to_threadpool to modules, allowing modules to specify a threadpool? Though this would break compatibility with older versions of Synapse. We could check if the method existed on ModuleApi and use the default threadpool if not.

s3_download_task(
self._get_s3_client(), self.bucket, self.prefix + path, self.extra_args, d, logcontext
)
await self._module_api.defer_to_thread(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module previously used its own threadpool (self._s3_pool), and even had an option to configure the size (threadpool_size, default 40). In previous issues, we've suggested users configure boto3's connection count to match the configured threadpool size: #117

The default threadpool size in Twisted is 10. I worry that switching to Synapse's default threadpool will hurt the performance of this module, with no way for users to configure it.

-- @anoadragon453, #134 (comment)

Can we recommend people increase the size of the the default Twisted threadpool?

Is there any difference in having a separate threadpool?

Copy link

@MadLittleMods MadLittleMods Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in #synapse-dev:matrix.org,

Here is why a separate threadpool is important:

I'd be worried about s3 monopolising the thread pool and blocking other operations (like DNS lookups)

-- @richvdh

As @anoadragon453 points out, DNS lookups have their own threadpool already but the point still stands generally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we recommend people increase the size of the the default Twisted threadpool?

We don't currently have such an option in Synapse, but it would be better than nothing.

Perhaps we should try deploying the module with the default threadpool on matrix.org and see if performance suffers? I'm just worried that if we go ahead with not adding any way to configure the threadpool size, yet require people to upgrade this module to use the latest Synapse, they could be stuck between a rock and a hard place.

Otherwise, I think:

Perhaps we should additionally expose defer_to_threadpool to modules, allowing modules to specify a threadpool? Though this would break compatibility with older versions of Synapse. We could check if the method existed on ModuleApi and use the default threadpool if not.

May be the way to go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we recommend people increase the size of the the default Twisted threadpool?

The size of the default Twisted threadpool can only be increased through code, i.e.:

we don't provide a configuration option which tweaks this currently, so users are unable to increase the size of the default threadpool.


Above I suggested exposing X to modules. Another alternative is to add an argument to the already-exposed defer_to_thread ModuleApi method to allow specifying a threadpool, defaulting to the default Twisted threadpool if not provided. It would then use defer_to_threadpool under the hood instead of defer_to_thread.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds workable 👍

Copy link
Member Author

@anoadragon453 anoadragon453 Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried the latter approach in element-hq/synapse#19032.

@erikjohnston
Copy link
Member

FYI I tried this branch on jki.re with latest develop and no new media is being downloaded. It seems to fail with:

2025-10-03 10:02:43,349 - twisted - 278 - CRITICAL - sentinel - 
Traceback (most recent call last):
  File "/home/erikj/.virtualenvs/synapse311/lib/python3.11/site-packages/s3_storage_provider.py", line 254, in _stream_to_producer
    raise Exception("Timed out waiting to resume")
Exception: Timed out waiting to resume

@anoadragon453 anoadragon453 changed the title Replace manual LoggingContext usage with ModuleApi.run_in_background Replace manual LoggingContext usage with ModuleApi.defer_to_threadpool Oct 8, 2025
Discovered by @erikj:

> the thread is waiting for synapse to use the S3Responder, but the responder isn't returned to Synapse until the thread is finished
> the `d` we pass in to `s3_download_task` gets resolved once we connect to S3, and the thread is concluded only once we finish the download.
@anoadragon453 anoadragon453 merged commit fff398f into main Oct 9, 2025
7 checks passed
@anoadragon453 anoadragon453 deleted the anoa/stop_using_logging_context_directly branch October 9, 2025 14:15
@anoadragon453
Copy link
Member Author

Thanks for the reviews both!

My theory as to why this PR passed the integration test CI, but failed on @erikjohnston's homeserver, is that the integration tests do not delete the media from Synapse's local storage after it's uploaded. Thus we don't end up actually fetching the media from the minIO S3 storage upon download.

I'll have a look at modifying the tests to actually do that.

Comment on lines +158 to +160
# DO await on `d`, as it will resolve once a connection to S3 has been
# opened. We only want to return to Synapse once we can start streaming
# chunks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-linking internal discussion where the cause of the Exception: Timed out waiting to resume was figured out.

@spantaleev
Copy link

Using defer_to_threadpool makes this (and s3-storage-provider=v1.6.0, which includes this patch) not compatible with Synapse <1.140.0, because defer_to_threadpool was only introduced (by this patch) in Synapse v1.140.0.

While Synapse v1.140.0rc1 warns about s3-storage-provider=v1.6.0 being required to run that Synapse version, we probably also need a similar warning for s3-storage-provider: you can't run s3-storage-provider=v1.6.0 with older Synapse versions.

Users on the current stable Synapse release (v1.139.0) are better off staying with s3-storage-provider=v1.5.0.

@anoadragon453
Copy link
Member Author

@spantaleev thanks for raising! We completely forgot to signal that. I've added a warning to the top of the latest release notes: https://github.com/matrix-org/synapse-s3-storage-provider/releases/tag/v1.6.0.

Synapse 1.140.0 is expected to be released tomorrow (as long as no other regressions are found).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants