-
Notifications
You must be signed in to change notification settings - Fork 7
Description
We've noticed some unusually long delays (>1 min) to indexing of data in the default homeserver in staging environment. This means that is takes over a minute for a new post to show in the UI, meanwhile showing "post deleted" message, which would suggest to the user that they failed to post.
This appears to align with recent changes to allow nexus to index other homeservers. Although this long delay was not noticed during initial testing, it would seem that monitored homeservers going offline are the cause. See evidence below to support this claim:
Nexus is periodically logging (approx every 90s) the following:
Nov 19 12:09:58 nexus-service.europe-west6-b.c.pubky-stag.internal nexusd[213875]: 2025-11-19T12:09:58.228105Z INFO nexus_watcher::service::traits::tevent_processor_runner:65: Event processor run for HS ufibwbmed6jeq9k4p583go95wofakh9fwpp4k734trq79pd9u1uy: duration 66.857114ms, status Ok
Nov 19 12:09:58 nexus-service.europe-west6-b.c.pubky-stag.internal nexusd[213875]: 2025-11-19T12:09:58.228116Z INFO nexus_watcher::service::traits::tevent_processor_runner:65: Event processor run for HS 5mgtatzz9qxdmbr4hy5tkazykqbim5dzmgf94fyw6ogyioij6duy: duration 30.135822823s, status Error
Nov 19 12:09:58 nexus-service.europe-west6-b.c.pubky-stag.internal nexusd[213875]: 2025-11-19T12:09:58.228124Z INFO nexus_watcher::service::traits::tevent_processor_runner:65: Event processor run for HS 8pinxxgqs41n4aididenw5apqp1urfmzdztr8jt4abrkdn435ewo: duration 2.049655ms, status Error
Nov 19 12:09:58 nexus-service.europe-west6-b.c.pubky-stag.internal nexusd[213875]: 2025-11-19T12:09:58.228132Z INFO nexus_watcher::service::traits::tevent_processor_runner:65: Event processor run for HS 8um71us3fyw6h8wbcxb5ar3rwusy1a6u49956ikzojg3gcwd1dty: duration 64.931395ms, status Ok
Nov 19 12:09:58 nexus-service.europe-west6-b.c.pubky-stag.internal nexusd[213875]: 2025-11-19T12:09:58.228140Z INFO nexus_watcher::service::traits::tevent_processor_runner:65: Event processor run for HS cei833be515cqmpifxb5fesk87u9sbemjg5ag4n7fdpkq6y7o8wy: duration 30.140947325s, status Error
Nov 19 12:09:58 nexus-service.europe-west6-b.c.pubky-stag.internal nexusd[213875]: 2025-11-19T12:09:58.228149Z INFO nexus_watcher::service::traits::tevent_processor_runner:65: Event processor run for HS ne6o3zyjabgh5medxsh7qqeipcpcs8dnjespy5rwgfgsr4ggxk8o: duration 30.208006287s, status Error
as you can see, 3 of the homeservers are erroring after 30 seconds whilst the others are either successful of error after a fraction of a second. This aligns with the perceived >1 min delay to posts being indexed (rough average based on timing of post being submitted).
Preceding this we see the timeout for each attempt to connect to each of the three homeservers at 30s intervals:
Nov 19 12:08:57 nexus-service.europe-west6-b.c.pubky-stag.internal nexusd[213875]: 2025-11-19T12:08:57.812141Z ERROR nexus_watcher::service::traits::tevent_processor:76: Event processor failed for 5mgtatzz9qxdmbr4hy5tkazykqbim5dzmgf94fyw6ogyioij6duy: PubkyClientError { message: "Some(hyper_util::client::legacy::Error(Connect, ConnectError(\"tcp connect error\", 213.160.75.186:6287, Os { code: 110, kind: TimedOut, message: \"Connection timed out\" })))" }
...
Nov 19 12:09:28 nexus-service.europe-west6-b.c.pubky-stag.internal nexusd[213875]: 2025-11-19T12:09:28.020059Z ERROR nexus_watcher::service::traits::tevent_processor:76: Event processor failed for cei833be515cqmpifxb5fesk87u9sbemjg5ag4n7fdpkq6y7o8wy: PubkyClientError { message: "Some(hyper_util::client::legacy::Error(Connect, ConnectError(\"tcp connect error\", 213.160.75.188:6287, Os { code: 110, kind: TimedOut, message: \"Connection timed out\" })))" }
...
Nov 19 12:09:58 nexus-service.europe-west6-b.c.pubky-stag.internal nexusd[213875]: 2025-11-19T12:09:58.228083Z ERROR nexus_watcher::service::traits::tevent_processor:76: Event processor failed for ne6o3zyjabgh5medxsh7qqeipcpcs8dnjespy5rwgfgsr4ggxk8o: PubkyClientError { message: "Some(hyper_util::client::legacy::Error(Connect, ConnectError(\"tcp connect error\", 213.160.75.187:6287, Os { code: 110, kind: TimedOut, message: \"Connection timed out\" })))" }
Finally, configuring nexus with monitored_homeservers_limit = 1 results in no delays to indexing new posts.
Therefore, it's fairly certain that these connection timeouts are the cause of delays to indexing from online homeserver, which is most noticeable with the default homeserver in normal use.
We already have a maximum number of monitored homeserver that would also limit the maximum delay causes by this. To reduce the impact further and reduce the attack surface we'd want to back off retries when unreachable, and possibly eventually forget it (pubky and IP/domain) altogether and consider it unreliable even if its users pubky continue being referenced.