Skip to content

Commit 55706e9

Browse files
committed
feat(GCS): Update CustomTime metadata field at cache entry hits
This commit introduces the architectural support for implementing a cache-provider-specific routine that updates an `atime`-like (access time) timestamp every time a cache entry is successfully hit. Several cloud providers implement such semantics under different names and APIs, most commonly for the purpose of providing a better time-out and culling mechanism. The logic in _sccache_ only performs the bookkeeping of this information. With this feature, it is possible to give the necessary inputs to the cloud provider's lifecycle routines to allow keeping cloud costs and size low by culling cache objects that had not been hit for a set amount of time. Added the support for the aforementioned logic specifically for the GCS cache implementation, where the custom access time medata is available under the `CustomTime` attribute, which can be bumped only ever forward in time. In case a bumping is unsuccessful (due to clocks getting out-of-synch), the error is silently discarded.
1 parent c719e5a commit 55706e9

File tree

7 files changed

+407
-9
lines changed

7 files changed

+407
-9
lines changed

docs/Gcs.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,16 @@ export SCCACHE_GCS_KEY_PATH=secret-gcp-storage.json
6565
Cache location GCS, bucket: Bucket(name=<bucket name in GCP>), key_prefix: (none)
6666
```
6767

68+
## Lifecycle management
69+
70+
Sccache updates the `CustomTime` metadata field of cache objects every time
71+
there was a cache hit.
72+
This can be used to implement automatic cleanup in GCS using the
73+
["Custom time before"](https://cloud.google.com/storage/docs/lifecycle#customtimebefore)
74+
or ["Days since custom time"](https://cloud.google.com/storage/docs/lifecycle#dayssincecustomtime)
75+
conditions on the bucket in order to remove cache entries which have not been
76+
actively used for a certain amount of time.
77+
6878
## Deprecation
6979

7080
`SCCACHE_GCS_OAUTH_URL` have been deprecated and not supported, please use `SCCACHE_GCS_SERVICE_ACCOUNT` instead.

src/cache/cache.rs

Lines changed: 172 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
use crate::cache::azure::AzureBlobCache;
1717
use crate::cache::disk::DiskCache;
1818
#[cfg(feature = "gcs")]
19-
use crate::cache::gcs::GCSCache;
19+
use crate::cache::gcs;
2020
#[cfg(feature = "gha")]
2121
use crate::cache::gha::GHACache;
2222
#[cfg(feature = "memcached")]
@@ -51,7 +51,9 @@ use std::io::{self, Cursor, Read, Seek, Write};
5151
use std::path::{Path, PathBuf};
5252
use std::sync::Arc;
5353
use std::time::Duration;
54+
use std::time::Instant;
5455
use tempfile::NamedTempFile;
56+
use tokio::sync::RwLock as TokioRwLock;
5557
use zip::write::FileOptions;
5658
use zip::{CompressionMethod, ZipArchive, ZipWriter};
5759

@@ -355,7 +357,8 @@ pub trait Storage: Send + Sync {
355357
///
356358
/// - `Ok(CacheMode::ReadOnly)` means cache can only be used to `get`
357359
/// cache.
358-
/// - `Ok(CacheMode::ReadWrite)` means cache can do both `get` and `put`.
360+
/// - `Ok(CacheMode::ReadWrite)` means cache can do all `get`, `put`, and
361+
/// `timestamp_cache_hit` methods.
359362
/// - `Err(err)` means cache is not setup correctly or not match with
360363
/// users input (for example, user try to use `ReadWrite` but cache
361364
/// is `ReadOnly`).
@@ -367,6 +370,21 @@ pub trait Storage: Send + Sync {
367370
Ok(CacheMode::ReadWrite)
368371
}
369372

373+
/// Stamp the custom "access time" or "custom time" record for an entry in
374+
/// the cache, if present.
375+
///
376+
/// It is not always generally possible or practical to query this
377+
/// information within sccache itself.
378+
///
379+
/// Returns a `Future` that will provide the result or error when the stamp
380+
/// request finished. In case the operation is supported and successfully
381+
/// completed, an `Ok(Some(Duration)` will be present as a `Result`. In case
382+
/// the operation can not be performed for configuration reasons an
383+
/// `Ok(None)` will be returned. In a context where it is assumed that the
384+
/// operation will succeed and any kind of error occurs, the `Err` is
385+
/// returned as the `Result`.
386+
async fn timestamp_cache_hit(&self, key: &str) -> Result<Option<Duration>>;
387+
370388
/// Get the storage location.
371389
fn location(&self) -> String;
372390

@@ -402,6 +420,32 @@ pub trait Storage: Send + Sync {
402420
}
403421
}
404422

423+
/// An interface to least recent usage time (`atime`-like) timestamp updates.
424+
#[async_trait]
425+
pub trait TimestampUpdater: Send + Sync {
426+
/// Returns whether the current implementation can update the timestamp.
427+
/// This might be `false` due to configuration reasons, or the lack of
428+
/// necessary rights.
429+
fn can_update(&self) -> bool {
430+
true
431+
}
432+
433+
/// Returns whether the `TimestampUpdater` needs (re-)initialization.
434+
/// A `true` value should indicate that a reinitialization is required or
435+
/// it can not be determined if such a reinitialization is required.
436+
/// A `false` value shall only be returned if it is deterministically
437+
/// known that reinitialization can be skipped.
438+
async fn needs_init(&self) -> Result<bool>;
439+
440+
/// (Re-)initializes the timestamp updater's runtime data, such as
441+
/// authentication tokens.
442+
async fn init(&mut self) -> Result<()>;
443+
444+
/// Updates the least recent use timestamp (if applicable) of the cache
445+
/// entry identified by `key` to the current timestamp.
446+
async fn update(&self, key: &str) -> Result<()>;
447+
}
448+
405449
/// Configuration switches for preprocessor cache mode.
406450
#[derive(Debug, Copy, Clone, PartialEq, Eq, Serialize, Deserialize)]
407451
#[serde(deny_unknown_fields)]
@@ -453,10 +497,9 @@ impl PreprocessorCacheModeConfig {
453497
}
454498
}
455499

456-
/// Implement storage for operator.
500+
/// Implement `Storage` for `opendal::Operator`.
457501
#[cfg(any(
458502
feature = "azure",
459-
feature = "gcs",
460503
feature = "gha",
461504
feature = "memcached",
462505
feature = "redis",
@@ -480,7 +523,7 @@ impl Storage for opendal::Operator {
480523
}
481524

482525
async fn put(&self, key: &str, entry: CacheWrite) -> Result<Duration> {
483-
let start = std::time::Instant::now();
526+
let start = Instant::now();
484527

485528
self.write(&normalize_key(key), entry.finish()?).await?;
486529

@@ -533,6 +576,29 @@ impl Storage for opendal::Operator {
533576
Ok(mode)
534577
}
535578

579+
#[cfg(any(
580+
feature = "azure",
581+
feature = "gha",
582+
feature = "memcached",
583+
feature = "redis",
584+
feature = "s3",
585+
feature = "webdav",
586+
))]
587+
async fn timestamp_cache_hit(&self, _key: &str) -> Result<Option<Duration>> {
588+
let scheme = self.info().scheme();
589+
match scheme {
590+
#[allow(unreachable_patterns)]
591+
// If we build only with `cargo build --no-default-features`, we
592+
// only want to use sccache with a local cache and no remote storage
593+
// support. Also, lack of support for `timestamp_cache_hit` is not
594+
// a problem for provider cases non-implemented in here.
595+
_ => {
596+
debug!("timestamp_cache_hit is not supported for {scheme}");
597+
Err(anyhow!("Not implemented."))
598+
}
599+
}
600+
}
601+
536602
fn location(&self) -> String {
537603
let meta = self.info();
538604
format!(
@@ -552,6 +618,92 @@ impl Storage for opendal::Operator {
552618
}
553619
}
554620

621+
/// Wrapper object for `Storage` implementations where a `TimestampUpdater`
622+
/// implementation is also available.
623+
#[derive(Debug)]
624+
pub struct TimestampUpdatingStorage<S: Storage, U: TimestampUpdater> {
625+
pub storage: S,
626+
pub updater: Arc<TokioRwLock<U>>,
627+
}
628+
629+
/// Implement `Storage` for `opendal::Operator` that also retained a
630+
/// `TimestampUpdater`.
631+
///
632+
/// Normally, this implementation calls the usual `Storage` trait methods.
633+
#[cfg(any(
634+
feature = "gcs",
635+
))]
636+
#[async_trait]
637+
impl<U: TimestampUpdater> Storage
638+
for TimestampUpdatingStorage<opendal::Operator, U> {
639+
async fn get(&self, key: &str) -> Result<Cache> {
640+
self.storage.get(key).await
641+
}
642+
643+
async fn put(&self, key: &str, entry: CacheWrite) -> Result<Duration> {
644+
self.storage.put(key, entry).await
645+
}
646+
647+
async fn check(&self) -> Result<CacheMode> {
648+
let as_storage: &dyn Storage = &self.storage;
649+
as_storage.check().await
650+
}
651+
652+
#[cfg(any(
653+
feature = "gcs",
654+
))]
655+
async fn timestamp_cache_hit(&self, key: &str) -> Result<Option<Duration>> {
656+
let scheme = self.storage.info().scheme();
657+
match scheme {
658+
#[cfg(feature = "gcs")]
659+
opendal::Scheme::Gcs => {
660+
let start = Instant::now();
661+
{
662+
// Try to run the update without reinitialization if
663+
// possible. This saves us taking the exclusive write lock
664+
// and speeds up overall performance if everyone can take
665+
// a const reference to `updater`.
666+
let updater = self.updater.read().await;
667+
if !updater.can_update() {
668+
//.The inability to update the cache, if from a known
669+
// and verifiable property, is not an error.
670+
return Ok(None);
671+
}
672+
if !updater.needs_init().await? {
673+
updater.update(key).await?;
674+
return Ok(Some(start.elapsed()));
675+
}
676+
}
677+
678+
{
679+
let mut updater = self.updater.write().await;
680+
updater.init().await?;
681+
updater.update(key).await?;
682+
683+
Ok(Some(start.elapsed()))
684+
}
685+
}
686+
#[allow(unreachable_patterns)]
687+
_ => {
688+
self.storage.timestamp_cache_hit(key).await
689+
}
690+
}
691+
}
692+
693+
fn location(&self) -> String {
694+
self.storage.location()
695+
}
696+
697+
async fn current_size(&self) -> Result<Option<u64>> {
698+
self.storage.current_size().await
699+
}
700+
701+
async fn max_size(&self) -> Result<Option<u64>> {
702+
self.storage.max_size().await
703+
}
704+
}
705+
706+
555707
/// Normalize key `abcdef` into `a/b/c/abcdef`
556708
pub(in crate::cache) fn normalize_key(key: &str) -> String {
557709
format!("{}/{}/{}/{}", &key[0..1], &key[1..2], &key[2..3], &key)
@@ -587,7 +739,7 @@ pub fn storage_from_config(
587739
}) => {
588740
debug!("Init gcs cache with bucket {bucket}, key_prefix {key_prefix}");
589741

590-
let storage = GCSCache::build(
742+
let storage = gcs::GCSCache::build(
591743
bucket,
592744
key_prefix,
593745
cred_path.as_deref(),
@@ -596,8 +748,21 @@ pub fn storage_from_config(
596748
credential_url.as_deref(),
597749
)
598750
.map_err(|err| anyhow!("create gcs cache failed: {err:?}"))?;
751+
let updater = gcs::GCSCustomTimeUpdater::new(
752+
bucket,
753+
key_prefix,
754+
cred_path.as_deref(),
755+
service_account.as_deref(),
756+
(*rw_mode).into(),
757+
credential_url.as_deref(),
758+
);
599759

600-
return Ok(Arc::new(storage));
760+
let storage_with_updater = TimestampUpdatingStorage {
761+
storage,
762+
updater: Arc::new(TokioRwLock::new(updater)),
763+
};
764+
765+
return Ok(Arc::new(storage_with_updater));
601766
}
602767
#[cfg(feature = "gha")]
603768
CacheType::GHA(config::GHACacheConfig { ref version, .. }) => {

src/cache/disk.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,11 @@ impl Storage for DiskCache {
168168
Ok(self.rw_mode)
169169
}
170170

171+
async fn timestamp_cache_hit(&self, _key: &str) -> Result<Option<Duration>> {
172+
// Not supported.
173+
Ok(None)
174+
}
175+
171176
fn location(&self) -> String {
172177
format!("Local disk: {:?}", self.lru.lock().unwrap().path())
173178
}

0 commit comments

Comments
 (0)