Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/orion-client-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,11 @@ jobs:
sudo chmod +x /home/orion/orion-runner/run.sh
sudo chmod +x /home/orion/orion-runner/cleanup.sh

# Grant CAP_DAC_READ_SEARCH file capability so orion can bypass
# DAC read/search checks without running as root
sudo setcap cap_dac_read_search+ep /home/orion/orion-runner/orion
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security note: setcap + NoNewPrivileges=false

Granting CAP_DAC_READ_SEARCH via file capabilities is a reasonable approach to avoid running as root. However, the systemd unit sets NoNewPrivileges=false, which means the process can gain additional capabilities/privileges post-exec. This is already noted in the service file as "relaxed for FUSE operations", but be aware that CAP_DAC_READ_SEARCH allows the process to bypass DAC read permissions on any file on the system — not just FUSE mounts.

Consider documenting the threat model: what files does orion need to read that it otherwise couldn't, and whether CAP_DAC_OVERRIDE (already in CapabilityBoundingSet) is a superset that makes this redundant.

getcap /home/orion/orion-runner/orion

# Update systemd service file if changed
if [ -f /home/orion/orion-runner/orion-runner.service ]; then
cp /home/orion/orion-runner/orion-runner.service /etc/systemd/system/
Expand Down Expand Up @@ -285,6 +290,11 @@ jobs:
sudo chmod +x /home/orion/orion-runner/run.sh
sudo chmod +x /home/orion/orion-runner/cleanup.sh

# Grant CAP_DAC_READ_SEARCH file capability so orion can bypass
# DAC read/search checks without running as root
sudo setcap cap_dac_read_search+ep /home/orion/orion-runner/orion
getcap /home/orion/orion-runner/orion

# Update systemd service file if changed
if [ -f /home/orion/orion-runner/orion-runner.service ]; then
sudo cp /home/orion/orion-runner/orion-runner.service /etc/systemd/system/
Expand Down
6 changes: 2 additions & 4 deletions orion/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ path = "src/main.rs"
api-model = { workspace = true }
tokio = { workspace = true, features = ["rt-multi-thread", "fs", "process"] }
tracing = { workspace = true }
tracing-subscriber = { workspace = true }
tracing-subscriber = { workspace = true, features = ["env-filter"] }
serde = { workspace = true, features = ["derive", "rc"] }
futures-util = { workspace = true }
once_cell = { workspace = true }
Expand All @@ -30,7 +30,5 @@ td_util_buck = { path = "./buck" }
thiserror = { workspace = true }
utoipa.workspace = true
common = { path = "../common" }
ring = "0.17.14"
hex = { workspace = true }
scorpiofs = "0.2.0"
scorpiofs = "0.2.1"
tokio-util = { workspace = true }
14 changes: 0 additions & 14 deletions orion/buck/run.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,42 +32,28 @@ pub struct Buck2 {
program: String,
/// The result of running `root`, if we have done so yet.
root: Option<PathBuf>,
/// The isolation directory to always use when invoking buck
isolation_dir: Option<String>,
}

impl Buck2 {
pub fn new(program: String, root: PathBuf) -> Self {
Self {
program,
root: Some(root),
isolation_dir: None,
}
}

pub fn with_root(program: String, root: PathBuf) -> Self {
Self {
program,
root: Some(root),
isolation_dir: None,
}
}

pub fn set_isolation_dir(&mut self, isolation_dir: String) {
self.isolation_dir = Some(isolation_dir);
}

pub fn command(&self) -> Command {
let mut command = Command::new(&self.program);
command
.env("BUCKD_STARTUP_TIMEOUT", "30")
.env("BUCKD_STARTUP_INIT_TIMEOUT", "1200");
match &self.isolation_dir {
None => {}
Some(isolation_dir) => {
command.args(["--isolation-dir", isolation_dir]);
}
}
command
}

Expand Down
8 changes: 4 additions & 4 deletions orion/scorpio.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,22 @@
# ==============================================================================

# Mega 服务地址(本地 mono 服务)
base_url = "https://git.buck2hub.com"
lfs_url = "https://git.buck2hub.com"
base_url = "https://git.gitmega.com"
lfs_url = "https://git.gitmega.com"

# Dicfuse 数据存储
store_path = "/tmp/megadir/store"

# Scorpio daemon 主挂载点(orion 不再启动 scorpio daemon,但 scorpiofs 仍需此配置)
workspace = "/tmp/megadir/mount"

config_file = "./config.toml"
# Git 提交信息
git_author = "MEGA"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The added config_file = "./config.toml" uses a relative path. This will resolve relative to the working directory at runtime, which may vary depending on how orion is launched (systemd, manual, CI). Consider using an absolute path or documenting the expected CWD.

git_email = "admin@mega.org"

# Dicfuse 读取配置
dicfuse_readable = "true"
load_dir_depth = "3"
load_dir_depth = "5"
fetch_file_thread = "10"
dicfuse_import_concurrency = "4"
dicfuse_dir_sync_ttl_secs = "5"
Expand Down
43 changes: 42 additions & 1 deletion orion/src/antares.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
//! This module provides a singleton wrapper around `scorpiofs::AntaresManager`
//! for managing overlay filesystem mounts used during build operations.

use std::{error::Error, io, path::PathBuf, sync::Arc};
use std::{error::Error, io, path::PathBuf, sync::Arc, time::Duration};

use scorpiofs::{AntaresConfig, AntaresManager, AntaresPaths};
use tokio::sync::OnceCell;
Expand Down Expand Up @@ -120,13 +120,54 @@ pub async fn mount_job(job_id: &str, cl: Option<&str>) -> Result<AntaresConfig,
.map_err(Into::into)
}

/// Initialize Antares during Orion startup and eagerly trigger Dicfuse import.
///
/// This keeps the first build request from paying the full Dicfuse cold-start
/// cost. Readiness waiting runs in the background so Orion can continue booting.
#[allow(dead_code)] // Called from the bin target (main.rs), not visible to lib check.
pub(crate) async fn warmup_dicfuse() -> Result<(), DynError> {
tracing::info!("Initializing Antares Dicfuse during Orion startup");
let manager = get_manager().await?;
let dicfuse = manager.dicfuse();

// Idempotent: safe even if the manager already started import internally.
dicfuse.start_import();

tokio::spawn(async move {
let warmup_timeout_secs: u64 = std::env::var("ORION_DICFUSE_WARMUP_TIMEOUT_SECS")
.ok()
.and_then(|v| v.parse().ok())
.unwrap_or(1200);
tracing::info!(
"Waiting for Antares Dicfuse warmup to finish (timeout: {}s)",
warmup_timeout_secs
);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: 20-minute warmup timeout

DICFUSE_WARMUP_TIMEOUT_SECS = 1200 (20 minutes) is quite generous. If the warmup is genuinely stuck, Orion will appear healthy (accepting WebSocket connections) but the first build will still hit cold-start latency since the warmup is just warn-logged and non-blocking.

Consider:

  1. Making this configurable via an environment variable (e.g., ORION_DICFUSE_WARMUP_TIMEOUT_SECS) so operators can tune it per-deployment.
  2. Exposing warmup status via a health check endpoint so load balancers can delay routing traffic until warmup completes.

match tokio::time::timeout(
Duration::from_secs(warmup_timeout_secs),
dicfuse.store.wait_for_ready(),
)
.await
{
Ok(_) => tracing::info!("Antares Dicfuse warmup completed"),
Err(_) => tracing::warn!(
"Antares Dicfuse warmup timed out after {}s",
warmup_timeout_secs
),
}
});

Ok(())
}

/// Unmount and cleanup a job overlay filesystem.
///
/// # Arguments
/// * `job_id` - The job identifier to unmount
///
/// # Returns
/// The `AntaresConfig` of the unmounted job if it existed.
#[allow(dead_code)]
pub async fn unmount_job(job_id: &str) -> Result<Option<AntaresConfig>, DynError> {
tracing::debug!("Unmounting Antares job: job_id={}", job_id);
get_manager()
Expand Down
Loading
Loading