feat: add FileStore implementation for cache (#427) #443

achsanalfitra · 2026-01-06T05:49:42Z

Pull Request to Issue #427

Key info:

CAS-based registry
Hashed using md-5 (which is already an existing dependency from sqlx)
Eviction on get and contains key (similar to memory cache)

Note:

Uses DateTime to parse serialized expiry into Timeout (this can be improved by implementing From in Timeout)
No retry policy implemented
No entry cap implemented

…ckages (leaner)

…ures

…r read after remove

ElijahAhianyo · 2026-01-13T16:23:20Z

cot/src/cache/store/file.rs

+        };
+
+        if expiry.is_expired(None) {
+            return Ok(false);


should we not remove the file here (early on), right when we find out the file has expired, making this a single source of truth?

Yes, that would be more efficient. Initially, I was aiming to match the eviction policy that only read() and contains_key() may delete. I guess it would be better to eagerly remove it once expired

ElijahAhianyo · 2026-01-13T16:29:13Z

cot/src/cache/store/file.rs

+        let mut buffer = Vec::new();
+
+        // advances cursor by the expiry header offset
+        file.seek(SeekFrom::Start(8))


We seem to seek here again after doing so when we parse the expiry. Does that not make this seek redundant?

Thinking about it again, I agree

I don't agree. Firstly, the parse_expiry function resets the cursor back to the beginning (as it should) so this is needed. Secondly, this should stay here for correctness purpose, we can't assume the cursor on the file will always be at the beginning.

ElijahAhianyo · 2026-01-13T16:38:48Z

cot/src/cache/store/file.rs

+}
+
+#[cfg(test)]
+mod tests {


we should also update

cot/cot-macros/src/cache.rs

Line 5 in 906c6cb

pub(super) fn fn_to_cache_test(test_fn: &ItemFn) -> TokenStream {

as well as

cot/cot/src/test.rs

Line 1777 in 906c6cb

pub struct TestCache {

with a File variant to setup integration tests for the file store

ElijahAhianyo · 2026-01-13T16:44:40Z

For the failing nightly tests, I suspect it might be broken upstream, I have that pinned in my PR(

cot/.github/workflows/rust.yml

Line 20 in 26ef6bb

_RUST_NIGHTLY: &rust_nightly nightly-2025-11-11

) which should make the CI deterministic, that should get merged soon

m4tx

Thank you for your patience waiting for my review! This looks very good already, but there are a few things I'd like to see clarified before we merge.

m4tx · 2026-01-13T23:16:46Z

cot/src/cache/store/file.rs

+//! # use std::path::PathBuf;
+//! # #[tokio::main]
+//! # async fn main() {
+//! let path = PathBuf::from("./cache_data");
+//! let store = FileStore::new(path).expect("Failed to initialize store");
+//!
+//! let key = "example_key".to_string();
+//! let value = serde_json::json!({"data": "example_value"});
+//!
+//! store.insert(key.clone(), value.clone(), Default::default()).await.unwrap();
+//!
+//! let retrieved = store.get(&key).await.unwrap();
+//! assert_eq!(retrieved, Some(value));
+//!
+//! # }


Suggested change

//! # use std::path::PathBuf;

//! # #[tokio::main]

//! # async fn main() {

//! let path = PathBuf::from("./cache_data");

//! let store = FileStore::new(path).expect("Failed to initialize store");

//!

//! let key = "example_key".to_string();

//! let value = serde_json::json!({"data": "example_value"});

//!

//! store.insert(key.clone(), value.clone(), Default::default()).await.unwrap();

//!

//! let retrieved = store.get(&key).await.unwrap();

//! assert_eq!(retrieved, Some(value));

//!

//! # }

//! # use std::path::PathBuf;

//! # #[tokio::main]

//! # async fn main() {

//!

//! let path = PathBuf::from("./cache_data");

//! let store = FileStore::new(path).expect("Failed to initialize store");

//!

//! let key = "example_key".to_string();

//! let value = serde_json::json!({"data": "example_value"});

//!

//! store.insert(key.clone(), value.clone(), Default::default()).await.unwrap();

//!

//! let retrieved = store.get(&key).await.unwrap();

//! assert_eq!(retrieved, Some(value));

//! # }

nitpick: formatting

m4tx · 2026-01-13T23:19:56Z

cot/src/cache/store/file.rs

+use std::path::Path;
+
+use chrono::{DateTime, Utc};
+use md5::{Digest, Md5};


Maybe it's worth using sha2 instead? I know md5 is already in our indirect dependencies - but sha2 is a direct dependency (used in the auth), and probably less prone to collisions. After all it doesn't matter that much, but it might be worth sticking to one algorithm as broadly as possible.

Actually, for caching purpose we don't need a classic cryptographic secure hashing algorithm, we should focus on a fast one instead. I'd suggest using BLAKE3 for that case and it has official Rust implementation. If we trust their benchmarks, it's much faster than SHA2.

@seqre only if we (or any of our dependencies) use it already. I don't see much value in having the fastest function for hashing the cache keys (which will typically be small).

m4tx · 2026-01-13T23:21:38Z

cot/src/cache/store/file.rs

+        key: &str,
+    ) -> CacheStoreResult<Option<(tokio::fs::File, std::path::PathBuf)>> {
+        let key_hash = FileStore::create_key_hash(key);
+        let path = self.dir_path.join(&key_hash);


What do we do in case there's a collision when hashing?

Currently, there's no resolve on collision. My ideas are to embed real name on the file as header (this is simpler but may incur syscall only if the real name doesn't match the file). Another option would be to implement a jump table and sync to file on push new hash

I don't think we really need a collision discovery unless it'd be simple to implement and fast at runtime. If we switch to another algorithm, their collision resistance is pretty high. For example for BLAKE3, it's 2**128. At that level of resistance, getting any collision is a feat and getting a specific collision for an attack seems almost impossible.

m4tx · 2026-01-13T23:23:56Z

cot/src/cache/store/file.rs

+        let data = serde_json::to_string(&value)
+            .map_err(|e| FileCacheStoreError::Serialize(Box::new(e)))?;
+
+        let mut buffer: Vec<u8> = Vec::with_capacity(8 + data.len());


Let's extract the magic number to a named constant to clarify what it means.

And since we essentially create a custom binary format, it might be worth documenting this at the module level.

m4tx · 2026-01-13T23:28:15Z

cot/src/cache/store/file.rs

+            if let Ok(meta) = entry.metadata().await
+                && meta.is_file()
+            {
+                total_size += meta.len();


This will return the number of bytes; not the number of entries, no?

Yes this would return the total bytes. I was under the impression that approx_size depends on the cache type to track its quantity unit. If this is changed to entries number, should this bytes aggregation function be keep around for future use (maybe for monitoring)?

I wouldn't say so, until we add functionality for that for all cache stores, it would be just dead code. We can always recreate it!

seqre

Thank you for your contribution, it's a great start! There are some things that needs changing before we merge it though.

seqre · 2026-01-14T10:59:07Z

cot/src/cache/store/file.rs

+//! let key = "example_key".to_string();
+//! let value = serde_json::json!({"data": "example_value"});
+//!
+//! store.insert(key.clone(), value.clone(), Default::default()).await.unwrap();


nit: I'd change it to Timeout::default() so that people reading the docs know what the 3rd argument is.

seqre · 2026-01-14T11:08:43Z

cot/src/cache/store/file.rs

+use std::path::Path;
+
+use chrono::{DateTime, Utc};
+use md5::{Digest, Md5};


Actually, for caching purpose we don't need a classic cryptographic secure hashing algorithm, we should focus on a fast one instead. I'd suggest using BLAKE3 for that case and it has official Rust implementation. If we trust their benchmarks, it's much faster than SHA2.

seqre · 2026-01-14T11:10:24Z

cot/src/cache/store/file.rs

+use crate::config::Timeout;
+use crate::error::error_impl::impl_into_cot_error;
+
+const ERROR_PREFIX: &str = "file based cache store error:";


nit:

Suggested change

const ERROR_PREFIX: &str = "file based cache store error:";

const ERROR_PREFIX: &str = "file-based cache store error:";

seqre · 2026-01-14T11:13:01Z

cot/src/cache/store/file.rs

+    #[error("{ERROR_PREFIX} file dir creation error: {0}")]
+    DirCreation(Box<dyn std::error::Error + Send + Sync>),
+
+    /// An error occured during temp file creation
+    #[error("{ERROR_PREFIX} file temp file creation error: {0}")]
+    TempFileCreation(Box<dyn std::error::Error + Send + Sync>),
+
+    /// An error occured during write/stream file
+    #[error("{ERROR_PREFIX} file io error: {0}")]


nit: "file" is already included in ERROR_PREFIX

Suggested change

#[error("{ERROR_PREFIX} file dir creation error: {0}")]

DirCreation(Box<dyn std::error::Error + Send + Sync>),

/// An error occured during temp file creation

#[error("{ERROR_PREFIX} file temp file creation error: {0}")]

TempFileCreation(Box<dyn std::error::Error + Send + Sync>),

/// An error occured during write/stream file

#[error("{ERROR_PREFIX} file io error: {0}")]

#[error("{ERROR_PREFIX} dir creation error: {0}")]

DirCreation(Box<dyn std::error::Error + Send + Sync>),

/// An error occured during temp file creation

#[error("{ERROR_PREFIX} temp file creation error: {0}")]

TempFileCreation(Box<dyn std::error::Error + Send + Sync>),

/// An error occured during write/stream file

#[error("{ERROR_PREFIX} io error: {0}")]

seqre · 2026-01-14T11:18:10Z

cot/src/cache/store/file.rs

+        store.create_dir_sync_root()?;
+
+        Ok(store)
+    }
+
+    fn create_dir_sync_root(&self) -> CacheStoreResult<()> {


nit: I think sync should be a suffix to show it's another version of existing async function

Suggested change

store.create_dir_sync_root()?;

Ok(store)

}

fn create_dir_sync_root(&self) -> CacheStoreResult<()> {

store.create_dir_root_sync()?;

Ok(store)

}

fn create_dir_root_sync(&self) -> CacheStoreResult<()> {

seqre · 2026-01-14T12:06:49Z

cot/src/cache/store/file.rs

+        &self,
+        file: &mut tokio::fs::File,
+    ) -> CacheStoreResult<Option<Value>> {
+        if !self.parse_expiry(file).await? {


I'm not fully convinced the parse_expiry is the best name, I had to spent few seconds parsing this line mentally. It's what the function does technically, but logically it checks if the file expired. I wonder if it would be better to rename it to check_expiry, is_expired or similar

seqre · 2026-01-14T12:25:34Z

cot/src/cache/store/file.rs

+        key: &str,
+    ) -> CacheStoreResult<Option<(tokio::fs::File, std::path::PathBuf)>> {
+        let key_hash = FileStore::create_key_hash(key);
+        let path = self.dir_path.join(&key_hash);


I don't think we really need a collision discovery unless it'd be simple to implement and fast at runtime. If we switch to another algorithm, their collision resistance is pretty high. For example for BLAKE3, it's 2**128. At that level of resistance, getting any collision is a feat and getting a specific collision for an attack seems almost impossible.

seqre · 2026-01-14T12:27:00Z

cot/src/cache/store/file.rs

+        Ok((temp_file, temp_path))
+    }
+
+    async fn file_open(


nit: maybe open_file_for_reading instead? It seems more aligned with what it does, or at least it says more about what it specifically does.

seqre · 2026-01-14T12:30:07Z

cot/src/cache/store/file.rs

+            if let Ok(meta) = entry.metadata().await
+                && meta.is_file()
+            {
+                total_size += meta.len();


I wouldn't say so, until we add functionality for that for all cache stores, it would be just dead code. We can always recreate it!

seqre · 2026-01-14T12:33:59Z

cot/src/cache/store/file.rs

+    }
+
+    async fn contains_key(&self, key: &str) -> CacheStoreResult<bool> {
+        let Ok(Some(mut file_tuple)) = self.file_open(key).await else {


Please deconstruct the tuple here with pattern matching into specific parts, so you don't have to use .0 and .1 below.

achsanalfitra added 13 commits January 6, 2026 12:40

feat: create file store cache and implement up to dir creation

3b600fe

feat: add md5 dependency

5126b35

feat(fix): use md-5 instead of md5 since its already used by other pa…

255330a

…ckages (leaner)

feat: add insert and get features; add single case test for said feat…

e3ae3db

…ures

feat: refactor file open then implement exist and remove; add test fo…

a3ceea6

…r read after remove

feat: implement clear and its test

5b0d728

feat: add approx size and its test case

3fee79b

chore: fix unused imports and clean docs

1fe3bc5

feat: implement eviction on contains_key

92c2db6

refactor deserialize logic

40da36d

add expiration integrity and contains key tests

12c4458

add impl into cache store error test

46e2dd4

add documentation

92ab5e4

github-actions bot added the C-lib Crate: cot (main library crate) label Jan 6, 2026

achsanalfitra added 7 commits January 7, 2026 19:43

fix: refactor code to match clippy lint

13747a9

fix: doc examples

f0df203

feat: change date serialization to i64

3068918

feat: improve error handling and casting logic in approx size

e8d5ca0

doc: annotate error handling on meta.len()

4a18edf

fix: refactor package declaration of md-5 into md5

63f91f7

fix: refactor code to match cargo fmt

ccd8c53

ElijahAhianyo reviewed Jan 13, 2026

View reviewed changes

m4tx requested changes Jan 13, 2026

View reviewed changes

seqre requested changes Jan 14, 2026

View reviewed changes

	const ERROR_PREFIX: &str = "file based cache store error:";
	const ERROR_PREFIX: &str = "file-based cache store error:";

Uh oh!

feat: add FileStore implementation for cache (#427) #443

Are you sure you want to change the base?

feat: add FileStore implementation for cache (#427) #443

Conversation

achsanalfitra commented Jan 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ElijahAhianyo commented Jan 13, 2026

Uh oh!

m4tx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m4tx Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seqre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

m4tx Jan 14, 2026 •

edited

Loading