Skip to content

fix(scorpio): refresh Antares mounts and add CL overlay support#1819

Merged
benjamin-747 merged 1 commit intoweb3infra-foundation:mainfrom
jerry609:fix/antares-cache-update
Jan 19, 2026
Merged

fix(scorpio): refresh Antares mounts and add CL overlay support#1819
benjamin-747 merged 1 commit intoweb3infra-foundation:mainfrom
jerry609:fix/antares-cache-update

Conversation

@jerry609
Copy link
Copy Markdown
Collaborator

This commit fixes stale Antares mount state by invalidating cache on unmount and adds CL overlay support for Antares mounts. CL build/clear now remounts to ensure layer changes take effect and whiteouts hide deleted files.

Changes:

  • Invalidate Antares mount cache on unmount/cleanup paths
  • Add CL overlay mount support with upper/cl layers
  • Remount on build_cl/clear_cl to refresh view
  • Refresh buck controller mount state before builds

Copilot AI review requested due to automatic review settings January 19, 2026 09:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes stale cache issues in Antares FUSE mounts and adds support for CL (changelist) overlays to enable viewing uncommitted changes in mounted filesystems. The implementation introduces TTL-based directory cache refresh, remounting logic for CL operations, and Buck2 daemon retry mechanisms to handle connection failures.

Changes:

  • Adds TTL-based directory cache refresh in dicfuse to prevent stale reads after unmount/remount cycles
  • Implements CL overlay support with HTTP-based file fetching, whiteout creation for deletions, and atomic remounting on build_cl/clear_cl operations
  • Enhances Buck2 integration with daemon connection retry logic and explicit mount lifecycle management

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
scorpio/src/dicfuse/store.rs Adds dir_needs_refresh function and TTL checks to ensure directories are reloaded when stale
scorpio/src/dicfuse/async_io.rs Modifies lookup to pre-refresh parent directories before path resolution
scorpio/src/daemon/antares.rs Implements CL layer building (fetch/download/whiteout), path normalization, and remount logic for build_cl/clear_cl operations
scorpio/src/antares/fuse.rs Reorders overlay layers to place CL above dicfuse for proper file override semantics
orion/src/buck_controller.rs Adds explicit unmount method to MountGuard with atomic tracking to prevent duplicate unmount attempts
orion/buck/run.rs Implements Buck2 daemon retry logic with automatic daemon kill and restart on connection failures

Comment on lines +570 to +589
fn normalize_abs_path(path: &str) -> String {
let trimmed = path.trim();
if trimmed.is_empty() {
return "/".to_string();
}
if trimmed == "/" {
return "/".to_string();
}
let mut normalized = if trimmed.starts_with('/') {
trimmed.to_string()
} else {
format!("/{trimmed}")
};
normalized = normalized.trim_end_matches('/').to_string();
if normalized.is_empty() {
"/".to_string()
} else {
normalized
}
}
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normalize_abs_path function is identical to normalize_mount_path (lines 549-568). This is code duplication. Consider consolidating these into a single function or having one call the other to maintain DRY principles.

Copilot uses AI. Check for mistakes.
})?;

let mode = libc::S_IFCHR | 0o000;
let dev = 0;
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unsafe block calling libc::mknod requires a SAFETY comment explaining why this is safe. According to the repository's coding guidelines, unsafe code must include a comment with rationale. The CString conversion above ensures the path is nul-terminated and valid, but this should be documented explicitly.

Suggested change
let dev = 0;
let dev = 0;
// SAFETY: `c_path` is a live `CString`, so `as_ptr()` yields a valid, NUL-terminated
// C string pointer with no interior NUL bytes for the duration of this call. The
// `mode` and `dev` arguments are plain values, and we check the return code for
// errors, propagating any OS error via `last_os_error`.

Copilot uses AI. Check for mistakes.
Comment on lines +552 to 571
let repo_path = {
let trimmed = repo.trim();
if trimmed.is_empty() {
"/".to_string()
} else if trimmed == "/" {
"/".to_string()
} else {
let mut normalized = if trimmed.starts_with('/') {
trimmed.to_string()
} else {
format!("/{trimmed}")
};
normalized = normalized.trim_end_matches('/').to_string();
if normalized.is_empty() {
"/".to_string()
} else {
normalized
}
return Err(e.into());
}
};
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path normalization code is duplicated across both the buck_controller and antares modules. Consider extracting this logic to a shared utility function to avoid duplication and ensure consistent behavior. The same normalization logic appears in scorpio/src/daemon/antares.rs (lines 549-589) and here.

Copilot uses AI. Check for mistakes.
Comment on lines +1481 to +1491
AntaresFuse::new(
mountpoint.clone(),
self.dicfuse.clone(),
upper_dir.clone(),
existing_cl_dir.clone(),
)
.await
.map_err(|e| {
ServiceError::Internal(format!("failed to create placeholder fuse: {}", e))
})?
});
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placeholder fuse is created with self.dicfuse.clone() instead of the dicfuse instance extracted from the original entry. This could cause issues during rollback if unmount or build_cl_layer fails, as the placeholder would reference the wrong dicfuse instance (potentially the global root instance instead of a subdirectory-specific instance for this mount). Use 'dicfuse' instead of 'self.dicfuse.clone()' for consistency with the final remount.

Copilot uses AI. Check for mistakes.
Comment on lines +1634 to +1644
AntaresFuse::new(
mountpoint.clone(),
self.dicfuse.clone(),
upper_dir.clone(),
existing_cl_dir.clone(),
)
.await
.map_err(|e| {
ServiceError::Internal(format!("failed to create placeholder fuse: {}", e))
})?
});
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placeholder fuse is created with self.dicfuse.clone() instead of the dicfuse instance extracted from the original entry. This could cause issues during rollback if unmount or clear operations fail, as the placeholder would reference the wrong dicfuse instance (potentially the global root instance instead of a subdirectory-specific instance for this mount). Use 'dicfuse' instead of 'self.dicfuse.clone()' for consistency with the final remount.

Copilot uses AI. Check for mistakes.
Comment on lines +634 to +789
async fn fetch_cl_files(&self, cl_link: &str) -> Result<Vec<ClFileEntry>, ServiceError> {
let base_url = crate::util::config::base_url();
let url = format!("{base_url}/api/v1/cl/{cl_link}/files-list");
let client = Self::http_client()?;
let resp = client
.get(url)
.send()
.await
.map_err(|e| ServiceError::Internal(format!("failed to fetch CL files: {}", e)))?;
if !resp.status().is_success() {
return Err(ServiceError::Internal(format!(
"failed to fetch CL files: HTTP {}",
resp.status()
)));
}
let body: CommonResult<Vec<ClFileEntry>> = resp.json().await.map_err(|e| {
ServiceError::Internal(format!("failed to parse CL files response: {}", e))
})?;
if !body.req_result {
return Err(ServiceError::Internal(format!(
"CL files response error: {}",
body.err_message
)));
}
Ok(body.data.unwrap_or_default())
}

async fn download_blob_to_path(
&self,
client: &Client,
oid: &str,
dest: &Path,
) -> Result<(), ServiceError> {
let base_url = crate::util::config::base_url();
let clean_oid = oid.trim_start_matches("sha1:");
let url = format!("{base_url}/api/v1/file/blob/{clean_oid}");
let resp = client
.get(url)
.send()
.await
.map_err(|e| ServiceError::Internal(format!("failed to download blob: {}", e)))?;
if !resp.status().is_success() {
return Err(ServiceError::Internal(format!(
"failed to download blob {}: HTTP {}",
clean_oid,
resp.status()
)));
}

if let Some(parent) = dest.parent() {
tokio::fs::create_dir_all(parent).await.map_err(|e| {
ServiceError::Internal(format!(
"failed to create CL parent dir {:?}: {}",
parent, e
))
})?;
}

let mut file = tokio::fs::File::create(dest).await.map_err(|e| {
ServiceError::Internal(format!("failed to create file {:?}: {}", dest, e))
})?;
let bytes = resp
.bytes()
.await
.map_err(|e| ServiceError::Internal(format!("failed to read blob data: {}", e)))?;
file.write_all(&bytes).await.map_err(|e| {
ServiceError::Internal(format!("failed to write file {:?}: {}", dest, e))
})?;
Ok(())
}

fn create_whiteout(path: &Path) -> Result<(), ServiceError> {
if let Some(parent) = path.parent() {
if let Err(e) = std::fs::create_dir_all(parent) {
return Err(ServiceError::Internal(format!(
"failed to create whiteout parent {:?}: {}",
parent, e
)));
}
}

if path.exists() {
let _ = std::fs::remove_file(path);
let _ = std::fs::remove_dir_all(path);
}

let c_path = CString::new(path.as_os_str().as_bytes()).map_err(|e| {
ServiceError::Internal(format!("invalid whiteout path {:?}: {}", path, e))
})?;

let mode = libc::S_IFCHR | 0o000;
let dev = 0;
let res = unsafe { libc::mknod(c_path.as_ptr(), mode, dev) };
if res != 0 {
return Err(ServiceError::Internal(format!(
"failed to create whiteout {:?}: {}",
path,
std::io::Error::last_os_error()
)));
}
Ok(())
}

async fn build_cl_layer(
&self,
mount_path: &str,
cl_link: &str,
cl_dir: &Path,
) -> Result<(), ServiceError> {
if cl_link.trim().is_empty() {
return Err(ServiceError::InvalidRequest(
"cl link cannot be empty".to_string(),
));
}

if cl_dir.exists() {
tokio::fs::remove_dir_all(cl_dir).await.map_err(|e| {
ServiceError::Internal(format!("failed to clear CL dir {:?}: {}", cl_dir, e))
})?;
}
tokio::fs::create_dir_all(cl_dir).await.map_err(|e| {
ServiceError::Internal(format!("failed to create CL dir {:?}: {}", cl_dir, e))
})?;

let files = self.fetch_cl_files(cl_link).await?;
if files.is_empty() {
return Ok(());
}

let client = Self::http_client()?;
for file in files {
let rel_path = match Self::relative_path_for_mount(&file.path, mount_path) {
Some(p) => p,
None => continue,
};
let dest = cl_dir.join(rel_path);
match file.action.as_str() {
"new" | "modified" => {
self.download_blob_to_path(&client, &file.sha, &dest)
.await?;
}
"deleted" => {
Self::create_whiteout(&dest)?;
}
other => {
tracing::warn!(
"Unknown CL action '{}' for path {}, skipping",
other,
file.path
);
}
}
}

Ok(())
}
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new CL layer building functionality (build_cl_layer, fetch_cl_files, download_blob_to_path, create_whiteout) lacks test coverage. While there are tests for the build_cl API endpoint, they use a mock service that doesn't exercise the actual HTTP fetching, blob downloading, and whiteout creation logic. Consider adding integration tests that verify the CL layer building, especially the path normalization, whiteout creation, and error handling.

Copilot uses AI. Check for mistakes.
@jerry609 jerry609 force-pushed the fix/antares-cache-update branch from 21e9ecd to 2457469 Compare January 19, 2026 09:19
This commit fixes stale Antares mount state by invalidating cache on unmount
and adds CL overlay support for Antares mounts. CL build/clear now remounts
to ensure layer changes take effect and whiteouts hide deleted files.

Changes:
- Invalidate Antares mount cache on unmount/cleanup paths
- Add CL overlay mount support with upper/cl layers
- Remount on build_cl/clear_cl to refresh view
- Refresh buck controller mount state before builds
- Fix clippy warnings and apply rustfmt in Antares CL path validation

Signed-off-by: jerry609 <1772030600@qq.com>
@jerry609 jerry609 force-pushed the fix/antares-cache-update branch from 2457469 to 39daba0 Compare January 19, 2026 09:25
@benjamin-747 benjamin-747 added this pull request to the merge queue Jan 19, 2026
Merged via the queue into web3infra-foundation:main with commit eddd7f7 Jan 19, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants