This repository was archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Draft: /messages investigation scratch pad1
#13440
Closed
MadLittleMods
wants to merge
25
commits into
madlittlemods/11850-migrate-to-opentelemetry
from
madlittlemods/13356-messages-investigation-scratch-v1
Closed
Draft: /messages investigation scratch pad1
#13440
MadLittleMods
wants to merge
25
commits into
madlittlemods/11850-migrate-to-opentelemetry
from
madlittlemods/13356-messages-investigation-scratch-v1
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…lemods/13356-messages-investigation-scratch-v1 Conflicts: synapse/api/auth.py
MadLittleMods
commented
Aug 3, 2022
Comment on lines
+201
to
+203
| # It does not seem like the agent can keep up with the massive UDP load | ||
| # (1065 spans in one trace) so lets just use the HTTP collector endpoint | ||
| # instead which seems to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this is the case? I was seeing this same behavior with the Jaeger opentracing stuff. Is the UDP connection being over saturated? Can the Jaeger agent in Docker not keep up? We see some spans come over but never the main servlet overarching one that is probably the last to be exported.
But using the HTTP Jaeger collector endpoint seems to work fine for getting the whole trace.
…ittlemods/13356-messages-investigation-scratch-v1
4 tasks
MadLittleMods
commented
Aug 8, 2022
…ittlemods/13356-messages-investigation-scratch-v1 Conflicts: pyproject.toml synapse/logging/tracing.py
This was referenced Aug 10, 2022
MadLittleMods
added a commit
that referenced
this pull request
Aug 16, 2022
…ittlemods/13356-messages-investigation-scratch-v1 Conflicts: synapse/federation/federation_client.py synapse/handlers/federation.py synapse/handlers/federation_event.py synapse/logging/tracing.py synapse/storage/controllers/persist_events.py synapse/storage/controllers/state.py synapse/storage/databases/main/events_worker.py synapse/util/ratelimitutils.py
…ittlemods/13356-messages-investigation-scratch-v1 Conflicts: poetry.lock synapse/handlers/federation.py
…ittlemods/13356-messages-investigation-scratch-v1
|
@MadLittleMods Is this useful or have you gleaned everything you can from it? |
…ittlemods/13356-messages-investigation-scratch-v1 Conflicts: synapse/handlers/federation.py synapse/handlers/relations.py
17 tasks
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
A-Messages-Endpoint
/messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill)
T-Task
Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Part of #13356
Combine:
/messages@tracedecorations, Instrument/messagesfor understandable traces in Jaeger #13368So that I can run against Complement federation tests and see if there is more to add
@traceto in the federation stack of things when/messageshappens.Optimization ideas
We load a lot of state (from 2. in #13356)
In
#matrixhqthere are 40k current members and I assumeget_current_stateis the root cause why weLoaded 79277 events(seems like that took 17s too). We only callget_current_statein order to get a list of likely domains to backfill from.We could optimize this by:
get_domains_from_stateso we don't have toget_current_stateas muchget_domains_from_statein the background so it's ready by the time we fail with the first couple of domains.Skip backfill
Skip backfill or kick it off in the background if it's not our first time and we have enough events.
We don't want to get stuck on the same unfetchable event over and over.
Why is
/state_idsslow to respond?We can't control every bad network effect but maybe Synapse is slow to assemble a
/state_idsreponse 🤔 Need to investigateFederationStateIdsServletFederationStateIdsServlet-/state_ids#13499)We should only care about
auth_event_idsWe should only care about getting the
event_idandauth_event_idsin_get_state_ids_after_missing_prev_event(...)We shouldn't factor
state_event_idsinto whetherDev notes
Jaeger max duration spans
213503982d 8h, see #13440 (comment)Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.(run the linters)