Replies: 3 comments
-
|
Thank you, much appreciated! I've been trying to find time to work on a refactor of this library that should allow me to add something to handle this (among other things). I've just been struggling to find the time at the moment due to a whirlwind of life events. I'm hoping things will calm down relatively soon so I can pick things back up. Once I am able to complete that refactor I should be able to add an optional feature to the reqwest client that pulls in one of the existing middleware solutions to plug in and configure. Failing that something custom should work. |
Beta Was this translation helpful? Give feedback.
-
|
Ah, yes, Real Life™. 😁 It's cool, I'll be happy to use/test your solution once you're able to get to it. In the meantime, here's what I came up with: use governor::{Quota, RateLimiter, DefaultDirectRateLimiter};
use core::time::Duration;
use std::num::NonZero;
use reqwest_middleware::ClientWithMiddleware;
use http_cache_reqwest::CacheMode;
use anyhow::{Result, anyhow};
#[tokio::main]
async fn main() -> Result<()> {
let client = ClientBuilder::new(Client::new())
.with(Cache(HttpCache {
mode: CacheMode::OnlyIfCached,
manager: CACacheManager::default(),
options: HttpCacheOptions::default(),
}))
.build()
if let Some(quota) = Quota::with_period(Duration::from_secs(6)) {
quota.allow_burst(NonZero::new(1).unwrap());
let limiter = RateLimiter::direct(quota);
for url in urls_to_fetch {
let body = fetch_url(&client, &limiter, url).await?;
println!("Fetched: {url}");
}
}
}
}
async fn fetch_url(
client: &ClientWithMiddleware,
limiter: &DefaultDirectRateLimiter,
url: String) -> Result<String>
{
// First check cache
let mut res = client.get(&url).send().await?;
let cache_status = res
.headers()
.get("X-Cache")
.and_then(|v| v.to_str().ok());
if cache_status == Some("MISS") {
// Only wait for the rate limiter if the response isn't cached
limiter.until_ready().await;
res = client.get(&url).with_extension(CacheMode::NoCache).send().await?;
}
match res.text().await {
Ok(body) => Ok(body),
_ => Err(anyhow!("Failed to fetch url: {}", &url)),
}
} |
Beta Was this translation helpful? Give feedback.
-
|
Worked on implementing a solution for this as part of PR #116 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
First of all, thank you very much for your work building and maintaining this! It's a great service to the Rust community. 🙏
I would like to have an HTTP cache, but also rate limit requests made to URLs when there is a cache miss. Like this:
This seems like the only way anyone would want this to work. And surely this is an extremely common desire for a scraper! And yet I can't figure out how to do this with existing libraries (as one or two middlewares). Either the rate limiting comes before the request, and thus can't use the cache status to skip limits; or it comes after, so responses are only returned after the rate limit is applied. This is not ideal because repeated requests are slower: any rate limit delay can't run concurrently with the previous fetch. Also, the last response is delayed even though the data has arrived and no more requests will ever be sent.
So I'm trying to figure out if I'm missing something. Do I really need to roll my own middleware for this? It seems like I can't use
http-cache-reqwestbecause of the need to insert the rate limit in between the cache check and the real fetch.Beta Was this translation helpful? Give feedback.
All reactions