Skip to content

spider-rs/hybrid_cache_server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hybrid_cache_server

A small Rust service that acts as a Chrome-aware cache indexing server:

  • RocksDB for persistent storage
  • DashMap as an in-memory cache
  • Meilisearch for index lookups
  • Deduped file bodies so shared assets (e.g. CDNs like jQuery) are stored once and reused across websites

You send it HTTP responses (with your own resource_key / website_key) and it:

  • Stores the metadata + body
  • Deduplicates the body via a content hash
  • Indexes metadata in Meilisearch
  • Lets you quickly retrieve:
    • a single resource by resource_key
    • all resources for a given website by website_key

Quick start

Make sure to have Rust. Rocksdb, and Meilisearch installed.

  1. cargo install hybrid_cache_server
  2. ./start.sh

Use the env variable CACHE_PORT to change the startup port.

Data Model

Keys

  • website_key
    Represents a site-level identifier. Examples:

    • "example.com"
    • "https://example.com"

    This is used to group resources so you can ask: “give me everything for this website”.

  • resource_key
    A unique cache key per resource (you generate this on the producer side, typically from your put_hybrid_cache logic).

    Examples:

    • GET:https://example.com/
    • GET:https://example.com/style.css
    • GET:https://cdn.example.com/jquery.js::Accept:text/javascript

    Whatever you use here must match the key you pass to put_hybrid_cache(cache_key, ...).

  • file_id
    Internally computed as blake3(body_bytes) and hex-encoded.
    All bodies with the same content share the same file_id and are stored once in RocksDB.

RocksDB Key Layout

Internally we use these key prefixes:

  • file:{file_id} → JSON-encoded FileEntry (the raw body bytes)
  • res:{resource_key} → JSON-encoded ResourceEntry (metadata, including file_id)
  • site:{website_key}::{resource_key} → empty value used as an index to quickly scan all resources for a site

This layout lets us:

  • Deduplicate file content (file:{file_id} reused across many resources)
  • Quickly find all resource_keys for a given website_key via prefix iteration

HTTP API

All endpoints are under /cache/*.

POST /cache/index

Index a single resource (one HTTP response).

Request

  • Headers:

    • Optional: X-Cache-Site: example.com
      Overrides/sets website_key if present.
  • Body: JSON CachedEntryPayload:

{
  "website_key": "example.com", // optional; can come from header or derived from URL
  "resource_key": "GET:https://example.com/style.css",
  "url": "https://example.com/style.css",
  "method": "GET",
  "status": 200,
  "request_headers": {
    "Accept": "text/css"
  },
  "response_headers": {
    "Content-Type": "text/css; charset=utf-8"
  },
  "body_base64": "LyogY3NzIGJvZHkgKi8K"
}

POST /cache/index/batch — Index a batch of resources

Index many HTTP responses at once.

Request

  • Method: POST
  • Path: /cache/index/batch
  • Headers:
    • Content-Type: application/json
    • Optional: X-Cache-Site: example.com (applies as a default/override depending on your server logic)
  • Body: JSON array of the same payload objects used in /cache/index
[
  {
    "website_key": "example.com",
    "resource_key": "GET:https://example.com/",
    "url": "https://example.com/",
    "method": "GET",
    "status": 200,
    "request_headers": { "Accept": "text/html" },
    "response_headers": { "Content-Type": "text/html" },
    "body_base64": "PGh0bWw+Li4uPC9odG1sPg=="
  },
  {
    "website_key": "example.com",
    "resource_key": "GET:https://example.com/app.js",
    "url": "https://example.com/app.js",
    "method": "GET",
    "status": 200,
    "request_headers": { "Accept": "*/*" },
    "response_headers": { "Content-Type": "application/javascript" },
    "body_base64": "Y29uc29sZS5sb2coImhpIik7"
  }
]

GET /cache/resource/{resource_key} — Fetch a cached resource

Lookup a cached resource by its resource_key.

Request

  • Method: GET
  • Path: /cache/resource/{resource_key}
  • Query params (optional):
    • raw=1 → return raw bytes (instead of JSON/base64)
    • format=bytes or format=raw → same as raw=1

Response

  • Default: JSON containing metadata + body_base64
  • With raw=1 (or format=raw|bytes): returns the raw body bytes (content-type may be inferred from stored headers)

Examples

Fetch JSON (default):

curl -sS "http://127.0.0.1:8080/cache/resource/GET:https%3A%2F%2Fexample.com%2Fapp.js"

GET /cache/site/{website_key} — List resources for a site

Lookup cached resources by website_key (ex: a domain / site key).

Request

  • Method: GET
  • Path: /cache/site/{website_key}

Response

Returns JSON for the site index (typically includes a list of resource keys and/or metadata, depending on your server’s index schema).

Example

curl -sS "http://127.0.0.1:8080/cache/site/example.com"

GET /cache/size — Cache size & stats

Returns current cache statistics for memory + RocksDB.

Request

  • Method: GET
  • Path: /cache/size

Response

JSON with stats (example fields):

  • rocksdb.*: RocksDB estimates and sizes
  • rocksdb_dir_bytes: on-disk directory usage
  • mem_cache.entries: in-memory entry count
  • mem_cache.body_bytes: in-memory body byte total

Example

curl -sS "http://127.0.0.1:8080/cache/size"

Docker

docker build -f docker/Dockerfile.ubuntu -t hybrid-cache:ubuntu --build-arg BIN_NAME=hybrid_cache_server .
docker run -p 8080:8080 -p 7700:7700 -e MEILI_MASTER_KEY=masterKey hybrid-cache:ubuntu

About

High performance cache index server

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published