Rate limiting cache #96

nk9 · 2025-01-09T23:06:49Z

nk9
Jan 9, 2025

First of all, thank you very much for your work building and maintaining this! It's a great service to the Rust community. 🙏

I would like to have an HTTP cache, but also rate limit requests made to URLs when there is a cache miss. Like this:

Initiate request
Fetch from the cache if available and return response (short-circuit)
Rate limit requests to the remote server
Fetch from the remote server
Return remote response

This seems like the only way anyone would want this to work. And surely this is an extremely common desire for a scraper! And yet I can't figure out how to do this with existing libraries (as one or two middlewares). Either the rate limiting comes before the request, and thus can't use the cache status to skip limits; or it comes after, so responses are only returned after the rate limit is applied. This is not ideal because repeated requests are slower: any rate limit delay can't run concurrently with the previous fetch. Also, the last response is delayed even though the data has arrived and no more requests will ever be sent.

So I'm trying to figure out if I'm missing something. Do I really need to roll my own middleware for this? It seems like I can't use http-cache-reqwest because of the need to insert the rate limit in between the cache check and the real fetch.

06chaynes · 2025-01-10T04:00:35Z

06chaynes
Jan 10, 2025
Maintainer

Thank you, much appreciated! I've been trying to find time to work on a refactor of this library that should allow me to add something to handle this (among other things). I've just been struggling to find the time at the moment due to a whirlwind of life events. I'm hoping things will calm down relatively soon so I can pick things back up.

Once I am able to complete that refactor I should be able to add an optional feature to the reqwest client that pulls in one of the existing middleware solutions to plug in and configure. Failing that something custom should work.

0 replies

nk9 · 2025-01-10T09:58:12Z

nk9
Jan 10, 2025
Author

Ah, yes, Real Life™. 😁 It's cool, I'll be happy to use/test your solution once you're able to get to it. In the meantime, here's what I came up with:

use governor::{Quota, RateLimiter, DefaultDirectRateLimiter};
use core::time::Duration;
use std::num::NonZero;
use reqwest_middleware::ClientWithMiddleware;
use http_cache_reqwest::CacheMode;
use anyhow::{Result, anyhow};

#[tokio::main]
async fn main() -> Result<()> {
    let client = ClientBuilder::new(Client::new())
        .with(Cache(HttpCache {
            mode: CacheMode::OnlyIfCached,
            manager: CACacheManager::default(),
            options: HttpCacheOptions::default(),
        }))
        .build()

    if let Some(quota) = Quota::with_period(Duration::from_secs(6)) {
        quota.allow_burst(NonZero::new(1).unwrap());
        let limiter = RateLimiter::direct(quota);

            for url in urls_to_fetch {
                let body = fetch_url(&client, &limiter, url).await?;
                println!("Fetched: {url}");
            }
        }
    }
}

async fn fetch_url(
    client: &ClientWithMiddleware,
    limiter: &DefaultDirectRateLimiter,
    url: String) -> Result<String>
{
    // First check cache
    let mut res = client.get(&url).send().await?;

    let cache_status = res
        .headers()
        .get("X-Cache")
        .and_then(|v| v.to_str().ok());

    if cache_status == Some("MISS") {
        // Only wait for the rate limiter if the response isn't cached
        limiter.until_ready().await;

        res = client.get(&url).with_extension(CacheMode::NoCache).send().await?;
    }

    match res.text().await {
        Ok(body) => Ok(body),
        _ => Err(anyhow!("Failed to fetch url: {}", &url)),
    } 
}

0 replies

06chaynes · 2025-08-25T02:42:45Z

06chaynes
Aug 25, 2025
Maintainer

Worked on implementing a solution for this as part of PR #116

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Rate limiting cache #96

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Rate limiting cache #96

Uh oh!

nk9 Jan 9, 2025

Replies: 3 comments

Uh oh!

06chaynes Jan 10, 2025 Maintainer

Uh oh!

Uh oh!

nk9 Jan 10, 2025 Author

Uh oh!

06chaynes Aug 25, 2025 Maintainer

nk9
Jan 9, 2025

06chaynes
Jan 10, 2025
Maintainer

nk9
Jan 10, 2025
Author

06chaynes
Aug 25, 2025
Maintainer