-
-
Notifications
You must be signed in to change notification settings - Fork 16
Description
This is a current major goal, but we didn’t have an issue tracking it!
This package should be thread-safe, but because it is based on Requests, we can’t guarantee that. Requests’s authors won’t guarantee it, and at various times have expressed everything from certainty that it’s not thread-safe to cautious optimism that it might be, but overall have expressed that they don’t plan to go out of their way to make certain (and then to maintain that status).
Thread safety is important for EDGI’s use: we need to pull lots of data and so need to make get_memento() calls concurrent. Connections also need to be pooled and shared across threads for stability (when EDGI implemented some hacks to do this, speed and reliability both improved considerably). In any case, other EDGI codebases implement some pretty nutty workarounds to make that relatively safe. That shouldn’t be necessary, and people should just be able to use a single WaybackClient across multiple threads.
Ideally, we should switch to a different HTTP client instead of Requests. Two options:
-
urllib3 is thread-safe and is what Requests is based on, so it should be a reliable switch. Its API is very different, though, so there’s lots to update and test.
-
httpx is new, has an API that matches Requests, and has lots of fancy features. It also has async support in case we ever want to make this package async-capable in the future. However, it’s still in beta (should be final before the end of the year), and may possibly have some funny under-the-hood differences we’ll need to account for. We’d at least need to figure out how to re-implement our crazy hack for Wayback’s gzip issues.
Some other approaches worth keeping in the back pocket, but that probably aren’t ideal:
-
A sketched out implementation in Sketch out a way to support multithreading #23 makes a funky abstraction that lets it appear as though you are using a single
WaybackClienton multiple threads, when in fact you are using several. It’s clever, but also a little hacky and probably has a lot of messy corner cases as a result. I don’t feel great about it. -
We could implement EDGI’s workaround from web-monitoring-processing under the hood so that all connections across all clients are pooled. This makes things magically seem to work even if you create separate clients on each thread, but it’s probably unexpected behavior. If a user was actually trying to isolate sets of connections, this would get in their way (and do it in a silent way so they might not even know). It also depends on a small part of Requests staying as thread-safe as it currently is, and there are no guarantees there.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status