-
Notifications
You must be signed in to change notification settings - Fork 404
Fix server_name in logging context for multiple Synapse instances in one process
#18868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix server_name in logging context for multiple Synapse instances in one process
#18868
Conversation
This allows us to get access to `server_name` before
which we may want to use in the `with LoggingContext("main"):`
call early on.
This also allows us more flexibility to parse config however
we want and setup a Synapse homeserver. Like what we do
in Synapse Pro for Small Hosts.
There are no instances where we don't provide it either
server_name to loggingserver_name in logging context for multiple Synapse instances in one process
We will need this in #18868 but not for now
#18870) Remove `sentinel` logcontext where we log in `setup`, `start`, and exit. Instead of having one giant PR that removes all places we use `sentinel` logcontext, I've decided to tackle this more piece-meal. This PR covers the parts if you just startup Synapse and exit it with no requests or activity going on in between. Part of #18905 (Remove `sentinel` logcontext where we log in Synapse) Prerequisite for #18868. Logging with the `sentinel` logcontext means we won't know which server the log came from. ### Why https://github.com/element-hq/synapse/blob/9cc400177822805e2a08d4d934daad6f3bc2a4df/docs/log_contexts.md#L71-L81 (docs updated in #18900) ### Testing strategy 1. Run Synapse normally and with `daemonize: true`: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Execute some requests 1. Shutdown the server 1. Look for any bad log entries in your homeserver logs: - `Expected logging context sentinel but found main` - `Expected logging context main was lost` - `Expected previous context` - `utime went backwards!`/`stime went backwards!` - `Called stop on logcontext POST-0 without recording a start rusage` 1. Look for any logs coming from the `sentinel` context With these changes, you should only see the following logs (not from Synapse) using the `sentinel` context if you start up Synapse and exit: `homeserver.log` ``` 2025-09-10 14:45:39,924 - asyncio - 64 - DEBUG - sentinel - Using selector: EpollSelector 2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - Received SIGINT, shutting down. 2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - (TCP Port 9322 Closed) 2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 8008 Closed) 2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 9093 Closed) 2025-09-10 14:45:40,564 - twisted - 281 - INFO - sentinel - Main loop terminated. ```
This allows us to get access to `server_name` so we can use it when
creating the `LoggingContext("main")` in the future (pre-requisite for
#18868).
This also allows us more flexibility to parse config however we want and
setup a Synapse homeserver. Like what we do in [Synapse Pro for Small
Hosts](https://github.com/element-hq/synapse-small-hosts).
Split out from #18868
Conflicts: synapse/_scripts/synapse_port_db.py synapse/app/admin_cmd.py synapse/app/appservice.py synapse/app/client_reader.py synapse/app/event_creator.py synapse/app/federation_reader.py synapse/app/federation_sender.py synapse/app/frontend_proxy.py synapse/app/generic_worker.py synapse/app/homeserver.py synapse/app/media_repository.py synapse/app/pusher.py synapse/app/synchrotron.py synapse/app/user_dir.py synapse/logging/context.py tests/handlers/test_federation.py tests/util/test_logcontext.py
(downstream hasn't been updated yet)
Fix #12841 (originally matrix-org/synapse#12841)
| def __init__(self, request: str = ""): | ||
| self._default_request = request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We previously recommended adding this to your logging config and became no longer necessary after matrix-org/synapse#8051 although was just redundant and harmless if you still did.
filters:
context:
(): synapse.logging.context.LoggingContextFilter
request: ""
If we remove this __init__ constructor, we then start to see exceptions like #18868 (comment) so we should consider just reverting and putting it back in place for now.
Since it's been 5 years since we started adding LoggingContextFilter automatically for people, we could consider making a breaking change but it would be best to do this in a separate PR to track the decision easier.
I only tried to remove it because it seemed useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored LoggingContextFilter.__init__ constructor for now ⏩
I've also added all of the history context in a comment
fabf85e to
44fa84f
Compare
| @@ -0,0 +1 @@ | |||
| Fix `server_name` in logging context for multiple Synapse instances in one process. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progressed outside of this PR: We have removed enough sentinel logcontext usage in Synapse for this to be useful (tracked by #18905).
With this PR, we will be able to distinguish which server sent the logs wherever we're not using sentinel logcontext. And if we find any more sentinel logcontext usage, we can be fix it up piecemeal like normal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looked like a scary PR numbers-wise but was actually pretty approachable since it was almost entirely just straightforward plumbing; a real gold star for structuring your PR sequence to make this possible, so thanks!
| def __init__(self, request: str = ""): | ||
| def __init__( | ||
| self, | ||
| # `request` is here for backwards compatibility since we previously recommended |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice comment, thanks :)
| clock = Clock(cast(ISynapseThreadlessReactor, reactor)) | ||
| clock = Clock( | ||
| cast(ISynapseThreadlessReactor, reactor), | ||
| server_name="synapse_module_running_from_unknown_server", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to track which server this is running on? Such information would be useful if a particular module is misbehaving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we could do a pattern like the following today because we expose ModuleApi.server_name(), I'd prefer not to make a further mess.
Modified DirectServeJsonResource
class DirectServeJsonResource(_AsyncResource):
"""A resource that will call `self._async_on_<METHOD>` on new requests,
formatting responses and errors as JSON.
"""
def __init__(
self,
canonical_json: bool = False,
extract_context: bool = False,
# Clock is optional as this class is exposed to the module API.
clock: Optional[Clock] = None,
# This is only necessary for Module API users who don't pass in a `clock`.
server_name: Optional[str] = None,
):
"""
Args:
canonical_json: TODO
extract_context: TODO
clock: This is expected to be passed in by any Synapse code.
Only optional for the Module API.
server_name: The homeserver name (this should be `ModuleApi.server_name()`).
Only used for the Module API if `clock` is not passed in.
"""
if clock is None:
clock = Clock(
cast(ISynapseThreadlessReactor, reactor),
server_name=server_name
if server_name is not None
else "synapse_module_running_from_unknown_server",
)
else:
assert server_name is None, (
"No need to pass in `server_name` if clock is set"
)
super().__init__(clock, extract_context)
self.canonical_json = canonical_jsonUsage:
class SomeResource(DirectServeJsonResource):
def __init__(
self,
module_api: ModuleApi,
):
super().__init__(server_name=module_api.server_name())
class SomeSynapseModule:
"""
Synapse module that TODO
"""
def __init__(
self, config: MyModuleConfig, module_api: ModuleApi
):
# Keep a reference to the config and Module API
self._module_api = module_api
# "Modules **must** register their web resources in their `__init__` method."
# (https://github.com/element-hq/synapse/blob/081f6ad50fa0ea87c348778e8be40517da25c698/docs/modules/writing_a_module.md#L69)
self._module_api.register_web_resource(
SomeResource(
self._module_api
),
)The ideal solution would be using the hs.get_clock() and passing in the Clock directly. See #18868 (comment) for why the homeserver clock is ideal. We currently don't expose Clock to module API consumers and maybe we shouldn't as it's best for them to continue using the dedicated ModuleApi equivalents.
Overall, this requires much more thought. And perhaps handing out less raw interfaces like DirectServeJsonResource in the module API as it makes doing the right thing like this practically impossible without making breaking changes.
This hasn't been necessary for 5 years since matrix-org/synapse#8051 because we automatically configure this for you within Synapse itself. Spawning from seeing this fail after we tried to change the `LoggingContextFilter` constructor in element-hq/synapse#18868 (comment). Although this served as a decent canary of what people may have historically configured, I've now added the relevant context to that part of the code as part of element-hq/synapse#18868
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanations. Agreed that this should be a lesson for future module API expansion.
This PR LGTM!
| real_clock = Clock(cast(ISynapseThreadlessReactor, reactor)) | ||
| real_clock = Clock( | ||
| cast(ISynapseThreadlessReactor, reactor), server_name=server_name | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the resolved conversation in the Linearizer section, wouldn't that apply here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you're talking about #18868 (comment)
Yes, ideally we'd use the homeserver Clock 👍
We already had access to server_name here and clock wasn't being passed in at all like the Linearizer had.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks 🙏
Conflicts: tests/util/test_async_helpers.py
|
Thanks for the review @reivilibre, @anoadragon453, and @jason-famedly 🐉 |
…t (#18870) Remove `sentinel` logcontext where we log in `setup`, `start`, and exit. Instead of having one giant PR that removes all places we use `sentinel` logcontext, I've decided to tackle this more piece-meal. This PR covers the parts if you just startup Synapse and exit it with no requests or activity going on in between. Part of element-hq/synapse#18905 (Remove `sentinel` logcontext where we log in Synapse) Prerequisite for element-hq/synapse#18868. Logging with the `sentinel` logcontext means we won't know which server the log came from. ### Why https://github.com/element-hq/synapse/blob/9cc400177822805e2a08d4d934daad6f3bc2a4df/docs/log_contexts.md#L71-L81 (docs updated in element-hq/synapse#18900) ### Testing strategy 1. Run Synapse normally and with `daemonize: true`: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Execute some requests 1. Shutdown the server 1. Look for any bad log entries in your homeserver logs: - `Expected logging context sentinel but found main` - `Expected logging context main was lost` - `Expected previous context` - `utime went backwards!`/`stime went backwards!` - `Called stop on logcontext POST-0 without recording a start rusage` 1. Look for any logs coming from the `sentinel` context With these changes, you should only see the following logs (not from Synapse) using the `sentinel` context if you start up Synapse and exit: `homeserver.log` ``` 2025-09-10 14:45:39,924 - asyncio - 64 - DEBUG - sentinel - Using selector: EpollSelector 2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - Received SIGINT, shutting down. 2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - (TCP Port 9322 Closed) 2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 8008 Closed) 2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 9093 Closed) 2025-09-10 14:45:40,564 - twisted - 281 - INFO - sentinel - Main loop terminated. ```
This allows us to get access to `server_name` so we can use it when
creating the `LoggingContext("main")` in the future (pre-requisite for
element-hq/synapse#18868).
This also allows us more flexibility to parse config however we want and
setup a Synapse homeserver. Like what we do in [Synapse Pro for Small
Hosts](https://github.com/element-hq/synapse-small-hosts).
Split out from element-hq/synapse#18868
Deployments that make use of the [synapse-s3-storage-provider](https://github.com/matrix-org/synapse-s3-storage-provider) module must upgrade to [v1.6.0](https://github.com/matrix-org/synapse-s3-storage-provider/releases/tag/v1.6.0). Using older versions of the module with this release of Synapse will prevent users from being able to upload or download media. No significant changes since 1.140.0rc1. - Add [a new Media Query by ID Admin API](https://element-hq.github.io/synapse/v1.140/admin_api/media_admin_api.html#query-a-piece-of-media-by-id) that allows server admins to query and investigate the metadata of local or cached remote media via the `origin/media_id` identifier found in a [Matrix Content URI](https://spec.matrix.org/v1.14/client-server-api/#matrix-content-mxc-uris). ([\element-hq#18911](element-hq#18911)) - Add [a new Fetch Event Admin API](https://element-hq.github.io/synapse/v1.140/admin_api/fetch_event.html) to fetch an event by ID. ([\element-hq#18963](element-hq#18963)) - Update [MSC4284: Policy Servers](matrix-org/matrix-spec-proposals#4284) implementation to support signatures when available. ([\element-hq#18934](element-hq#18934)) - Add experimental implementation of the `GET /_matrix/client/v1/rtc/transports` endpoint for the latest draft of [MSC4143: MatrixRTC](matrix-org/matrix-spec-proposals#4143). ([\element-hq#18967](element-hq#18967)) - Expose a `defer_to_threadpool` function in the Synapse Module API that allows modules to run a function on a separate thread in a custom threadpool. ([\element-hq#19032](element-hq#19032)) - Fix room upgrade `room_config` argument and documentation for `user_may_create_room` spam-checker callback. ([\element-hq#18721](element-hq#18721)) - Compute a user's last seen timestamp from their devices' last seen timestamps instead of IPs, because the latter are automatically cleared according to `user_ips_max_age`. ([\element-hq#18948](element-hq#18948)) - Fix bug where ephemeral events were not filtered by room ID. Contributed by @frastefanini. ([\element-hq#19002](element-hq#19002)) - Update Synapse main process version string to include git info. ([\element-hq#19011](element-hq#19011)) - Explain how `Deferred` callbacks interact with logcontexts. ([\element-hq#18914](element-hq#18914)) - Fix documentation for `rc_room_creation` and `rc_reports` to clarify that a `per_user` rate limit is not supported. ([\element-hq#18998](element-hq#18998)) - Remove deprecated `LoggingContext.set_current_context`/`LoggingContext.current_context` methods which already have equivalent bare methods in `synapse.logging.context`. ([\element-hq#18989](element-hq#18989)) - Drop support for unstable field names from the long-accepted [MSC2732](matrix-org/matrix-spec-proposals#2732) (Olm fallback keys) proposal. ([\element-hq#18996](element-hq#18996)) - Cleanly shutdown `SynapseHomeServer` object, allowing artifacts of embedded small hosts to be properly garbage collected. ([\element-hq#18828](element-hq#18828)) - Update OEmbed providers to use 'X' instead of 'Twitter' in URL previews, following a rebrand. Contributed by @HammyHavoc. ([\element-hq#18767](element-hq#18767)) - Fix `server_name` in logging context for multiple Synapse instances in one process. ([\element-hq#18868](element-hq#18868)) - Wrap the Rust HTTP client with `make_deferred_yieldable` so it follows Synapse logcontext rules. ([\element-hq#18903](element-hq#18903)) - Fix the GitHub Actions workflow that moves issues labeled "X-Needs-Info" to the "Needs info" column on the team's internal triage board. ([\element-hq#18913](element-hq#18913)) - Disconnect background process work from request trace. ([\element-hq#18932](element-hq#18932)) - Reduce overall number of calls to `_get_e2e_cross_signing_signatures_for_devices` by increasing the batch size of devices the query is called with, reducing DB load. ([\element-hq#18939](element-hq#18939)) - Update error code used when an appservice tries to masquerade as an unknown device using [MSC4326](matrix-org/matrix-spec-proposals#4326). Contributed by @tulir @ Beeper. ([\element-hq#18947](element-hq#18947)) - Fix `no active span when trying to log` tracing error on startup (when OpenTracing is enabled). ([\element-hq#18959](element-hq#18959)) - Fix `run_coroutine_in_background(...)` incorrectly handling logcontext. ([\element-hq#18964](element-hq#18964)) - Add debug logs wherever we change current logcontext. ([\element-hq#18966](element-hq#18966)) - Update dockerfile metadata to fix broken link; point to documentation website. ([\element-hq#18971](element-hq#18971)) - Note that the code is additionally licensed under the [Element Commercial license](https://github.com/element-hq/synapse/blob/develop/LICENSE-COMMERCIAL) in SPDX expression field configs. ([\element-hq#18973](element-hq#18973)) - Fix logcontext handling in `timeout_deferred` tests. ([\element-hq#18974](element-hq#18974)) - Remove internal `ReplicationUploadKeysForUserRestServlet` as a follow-up to the work in element-hq#18581 that moved device changes off the main process. ([\element-hq#18988](element-hq#18988)) - Switch task scheduler from raw logcontext manipulation to using the dedicated logcontext utils. ([\element-hq#18990](element-hq#18990)) - Remove `MockClock()` in tests. ([\element-hq#18992](element-hq#18992)) - Switch back to our own custom `LogContextScopeManager` instead of OpenTracing's `ContextVarsScopeManager` which was causing problems when using the experimental `SYNAPSE_ASYNC_IO_REACTOR` option with tracing enabled. ([\element-hq#19007](element-hq#19007)) - Remove `version_string` argument from `HomeServer` since it's always the same. ([\element-hq#19012](element-hq#19012)) - Remove duplicate call to `hs.start_background_tasks()` introduced from a bad merge. ([\element-hq#19013](element-hq#19013)) - Split homeserver creation (`create_homeserver`) and setup (`setup`). ([\element-hq#19015](element-hq#19015)) - Swap near-end-of-life `macos-13` GitHub Actions runner for the `macos-15-intel` variant. ([\element-hq#19025](element-hq#19025)) - Introduce `RootConfig.validate_config()` which can be subclassed in `HomeServerConfig` to do cross-config class validation. ([\element-hq#19027](element-hq#19027)) - Allow any command of the `release.py` script to accept a `--gh-token` argument. ([\element-hq#19035](element-hq#19035)) * Bump Swatinem/rust-cache from 2.8.0 to 2.8.1. ([\element-hq#18949](element-hq#18949)) * Bump actions/cache from 4.2.4 to 4.3.0. ([\element-hq#18983](element-hq#18983)) * Bump anyhow from 1.0.99 to 1.0.100. ([\element-hq#18950](element-hq#18950)) * Bump authlib from 1.6.3 to 1.6.4. ([\element-hq#18957](element-hq#18957)) * Bump authlib from 1.6.4 to 1.6.5. ([\element-hq#19019](element-hq#19019)) * Bump bcrypt from 4.3.0 to 5.0.0. ([\element-hq#18984](element-hq#18984)) * Bump docker/login-action from 3.5.0 to 3.6.0. ([\element-hq#18978](element-hq#18978)) * Bump lxml from 6.0.0 to 6.0.2. ([\element-hq#18979](element-hq#18979)) * Bump phonenumbers from 9.0.13 to 9.0.14. ([\element-hq#18954](element-hq#18954)) * Bump phonenumbers from 9.0.14 to 9.0.15. ([\element-hq#18991](element-hq#18991)) * Bump prometheus-client from 0.22.1 to 0.23.1. ([\element-hq#19016](element-hq#19016)) * Bump pydantic from 2.11.9 to 2.11.10. ([\element-hq#19017](element-hq#19017)) * Bump pygithub from 2.7.0 to 2.8.1. ([\element-hq#18952](element-hq#18952)) * Bump regex from 1.11.2 to 1.11.3. ([\element-hq#18981](element-hq#18981)) * Bump serde from 1.0.224 to 1.0.226. ([\element-hq#18953](element-hq#18953)) * Bump serde from 1.0.226 to 1.0.228. ([\element-hq#18982](element-hq#18982)) * Bump setuptools-rust from 1.11.1 to 1.12.0. ([\element-hq#18980](element-hq#18980)) * Bump twine from 6.1.0 to 6.2.0. ([\element-hq#18985](element-hq#18985)) * Bump types-pyyaml from 6.0.12.20250809 to 6.0.12.20250915. ([\element-hq#19018](element-hq#19018)) * Bump types-requests from 2.32.4.20250809 to 2.32.4.20250913. ([\element-hq#18951](element-hq#18951)) * Bump typing-extensions from 4.14.1 to 4.15.0. ([\element-hq#18956](element-hq#18956))
Background
As part of Element's plan to support a light form of vhosting (virtual host) (multiple instances of Synapse in the same Python process), we're currently diving into the details and implications of running multiple instances of Synapse in the same Python process.
"Per-tenant logging" tracked internally by https://github.com/element-hq/synapse-small-hosts/issues/48
Prior art
Previously, we exposed
server_nameby providing a static loggingMetadataFilterthat injected the values:synapse/synapse/config/logger.py
Line 216 in 205d9e4
While this can work fine for the normal case of one Synapse instance per Python process, this configures things globally and isn't compatible when we try to start multiple Synapse instances because each subsequent tenant will overwrite the previous tenant.
What does this PR do?
We remove the
MetadataFilterand replace it by tracking theserver_namein theLoggingContextand expose it with our existingLoggingContextFilterthat we already use to expose information about therequest.This means that the
server_namevalue follows wherever we log as expected even when we have multiple Synapse instances running in the same process.A note on logcontext
Anywhere, Synapse mistakenly uses the
sentinellogcontext to log something, we won't know which server sent the log. We've been fixing upsentinellogcontext usage as tracked by #18905Any further
sentinellogcontext usage we find in the future can be fixed piecemeal as normal.synapse/docs/log_contexts.md
Lines 71 to 81 in d2a966f
Testing strategy
%(server_name)sin the formatpoetry run synapse_homeserver --config-path homeserver.yamlcurl http://localhost:8008/_matrix/client/versions, etc)server_namein the logs as expected.unknown_server_from_sentinel_contextis expected for thesentinellogcontext (things outside of Synapse).Dev notes
DirectServeJsonResource#18600Linearizeruses a different reactor than the homeserver during tests matrix-org/synapse#12841LoggingContextFilter) Auto set logging filter matrix-org/synapse#8051Todo
sentinellogcontext usage (unknown_server_from_sentinel_context), Removesentinellogcontext where we log in Synapse #18905Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.