Glob Expansion, Presence validation for `Provide` Artifacts by Kajmany · Pull Request #85 · osusec/beavercds-backend

Kajmany · 2026-02-17T23:54:31Z

Our original motivation was allowing authors to use Unix-like globs when specifying artifacts to upload, but this also seems like good opportunity to validate files are present and readable within the challenge directory, where possible.

This is all currently done at build-time, and a subset of the functionality will be called by validate that is not dependent on build-time things.

Will update description/tasks as this is closer to being finished

Closes #54

TODO:

using new-type semantics for expanded/validated pahts with a loose contract and exporting to to the upload process

Kajmany · 2026-02-18T00:02:53Z

Asking for an early review to validate the architecture, and maybe style. Not asking for a look at the functionality because it's untested and almost certainly broken -- I'll make sure it passes new tests and the context from fail-state is usable before asking for that review.

My understanding of validate is it currently is called before we do our business, and is kept kind of separate. Globbing is a problem because we can't validate until after we're done globbing. I think any implementation will inevitably clash with the current model because of that.

What I'm trying this time: New-type wrapper on Paths that we've glob-expanded and validated. TOCTOU is a huge issue if we were to try to store these long-term, but I think our only current workflow is a pretty direct build & upload, which should be fine? Ideally, internal functions would require the new-type while friction with exposing them to external functions like fs::copy would be minimized.

Big question is whether this will be O.K. to slap in with the current way we do validation, or perhaps we could even cut scope if we can't easily think of a cleaner way. I personally like offering a contract with a wrapping type, but I'm not sure if it fits in well enough.

Kajmany

Annotated smaller bits I'm curious about. Just asking for a skim for now.

Kajmany · 2026-02-18T00:04:52Z

src/builder/artifacts.rs

+// FIXME: this was a quick hack but other semantics might be preferable
+// need to make it easy to expose internal for exernal APIs
+impl AsRef<Path> for InputPath {
+    fn as_ref(&self) -> &Path {
+        &self.0
+    }
+}


This was a quick and dirty way to expose the internals. Wondering if there's better semantics? Ideally we'd remove friction with passing it to external APIs that want &Path, but not allow internal functions that should require &InputPath

This makes sense for how to handle the conversion to me!

Kajmany · 2026-02-18T00:06:02Z

src/builder/artifacts.rs

        chal.directory, &provide
    );

    let docker = docker().await?;


Does this do anything? Couldn't tell if it's just unused or if it warms up the daemon or something

Yeah, looks unused. Probably a holdover from before the artifacts/docker were split out. The daemon client will get instantiated on first use regardless of where its called from so there's no reason for it to be done if its not used.

Kajmany · 2026-02-18T00:08:36Z

src/builder/artifacts.rs

+// TODO: verify readability (can stat even if we can't read, duh)
+
+/// For **local** (non-container) asset paths which expands globs
+/// but blocks references that are outside the base config directory.
+///
+/// Recursively descends into folders found as a result of globbing or single specification
+/// and accumulates each result as a separate path.
+//                                                    Ow! my stack!
+///
+/// Folders, files, and symlinks to them are accepted. All symlinks will be
+/// resolved and replaced in returned paths.
+///
+/// ---
+/// Glob-matching is attempted only on paths which do not exist.
+pub fn process_all_paths(chal_dir: &Path, paths: &[PathBuf]) -> Result<Vec<InputPath>> {
+    // anything above this dir would be leaving the directory
+    let canonical_chal_dir = fs::canonicalize(chal_dir)?;
+    let base_cfg_dir = canonical_chal_dir
+        .parent()
+        .and_then(|p| p.parent())
+        .ok_or_else(|| anyhow!("Project root not found (CWD is too shallow)"))?;
+
+    let resolved: Vec<InputPath> = paths.iter().try_fold(Vec::new(), |mut acc, path| {
+        match std::fs::canonicalize(path) {
+            // Assume paths we can resolve aren't globs
+            Ok(p) => {
+                let stat = fs::metadata(&p)?;
+                if stat.is_dir() {
+                    // Recursively run on directory contents, hope we don't explode 
+                    let dentry_paths = fs::read_dir(&p)?.map(|dr| dr.map(|d| d.path())).collect::<Result<Vec<PathBuf>, _>>()?;
+                    acc.extend(process_all_paths(chal_dir, &dentry_paths)?);
+                } else if stat.is_file() {
+                    acc.push(InputPath::fast_validate_single(&base_cfg_dir, &p)?);
+                } else {
+                    bail!("Expected canonicalized asset path to be file or directory but {p:?} is not");
+                }
+                Ok(acc)
+            }
+            // Try paths we can't locate as globs before failing
+            Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
+                let abs_path = std::path::absolute(path)?;
+                let expanded = expand_one_glob(&abs_path)?;
+                acc.extend(process_all_paths(chal_dir, &expanded)?);
+                Ok(acc)
+            }
+            Err(e) => Err(e).with_context(|| format!("failed to canonicalize path {:?}", path)),
+        }
+    })?;
+    Ok(resolved)
+}


This is going to naturally be a mess. Idea is to scrape folders, follow symlinks, and process globs in a recursive manner. There's external libraries to handle the folders, but I figured integrating them would be more mess than iterating dentries, and probably not that much safer?

Definitely incomplete, BTW. Wouldn't look at it seriously until unit testing is done, but drawing attention to it now

It took me a couple readings to fully parse what this was doing -- would it be cleaner to have the input/outputpath helpers return a enum of PathExists/PathFree/Error to represent the three cases here? it feels like we're mixing the path validation in to these different places instead of handling the check in one place.

It is currently "failing fast" and validating as we go -- I thought this would be more readable (perf wasn't really the concern) but it sounds like it's doing the opposite.

I'm going to basically re-write this function and most likely go with making validation separate from the path collection, because it sounds like that'll be more legible. The only problem I can think of that would introduce is accidentally specifying a directory outside the challenge and getting bogged down. I'll probably use a helper that keeps some state on depth/items accumulated to protect the stack/time for that case (and absurd crawls in-repo, too)

I will keep in mind your other comments on style & itertools during. Thanks!

Kajmany · 2026-02-18T00:09:42Z

src/deploy/s3.rs

        "assets/{chal_slug}/{file}",
        chal_slug = chal.directory.to_string_lossy(),
-        file = file.file_name().unwrap().to_string_lossy()
+        file = file.as_ref().file_name().unwrap().to_string_lossy()


I know the branch is behind, but is there more needed to integrate with upload? Only other effective change I can think of is that InputPath is guaranteed to be canonicalized.

Yeah, this should be just fine. Since this is only pulling off the filename the canonicalization doesn't matter.

detjensrobert

I would break out the new structs and glob helpers into another module here since the artifacts file is getting pretty large with all this.

detjensrobert · 2026-02-18T01:00:07Z

src/builder/artifacts.rs

+/// A path to a readable file that existed within the base directory.
+///
+/// Vulnerable to TOCTOU issues. Create and consume as close to usage as possible.
+pub struct InputPath(PathBuf);


I'd prefer to name this ValidatedPath or ExpandedPath since this is what is returned post-validation. InputPath reads more as a un-validated version of Path.

After poking around I see the Input here refers to input files for the archive/rename -- maybe ValidSourcePath and ValidDestPath then?

detjensrobert · 2026-02-18T04:37:44Z

src/builder/artifacts.rs

+///
+/// Recursively descends into folders found as a result of globbing or single specification
+/// and accumulates each result as a separate path.
+//                                                    Ow! my stack!


detjensrobert · 2026-02-18T04:43:15Z

src/builder/artifacts.rs

+        // any file at this path is a no-no
+        if fs::symlink_metadata(&abs_output).is_ok() {
+            bail!("Output path {output:?} already exists");
+        }


The user should be able to force overwrite with a CLI flag, like when rerunning and it would be clobbering previously generated files.

detjensrobert · 2026-02-18T04:46:14Z

src/builder/artifacts.rs

+        .ok_or_else(|| anyhow!("Project root not found (CWD is too shallow)"))?;
+
+    let resolved: Vec<InputPath> = paths.iter().try_fold(Vec::new(), |mut acc, path| {
+        match std::fs::canonicalize(path) {


Should the only place we are trying to canonicalize the path be in the path validate function? Feels weird to be doing it multiple times. Can we use that here or break that out from the structs into a separate helper?

detjensrobert · 2026-02-18T05:00:59Z

src/builder/artifacts.rs

+    files.iter().try_fold(Vec::new(), |mut acc, f| {
+        let paths = expand_one_glob(f)?;
+        acc.extend(paths);
+        Ok::<Vec<PathBuf>, anyhow::Error>(acc)
+    })


map().flatten_ok().collect() I think looks cleaner than try_fold with scattered extends() in the middle:

Suggested change

files.iter().try_fold(Vec::new(), |mut acc, f| {

let paths = expand_one_glob(f)?;

acc.extend(paths);

Ok::<Vec<PathBuf>, anyhow::Error>(acc)

})

files

.iter()

.map(|f| expand_one_glob(f))

.flatten_ok()

.collect()

detjensrobert · 2026-02-18T05:26:32Z

src/builder/artifacts.rs

+    // but we're not exporting these bad semantics outside this module to be fair
+    let pattern = file.to_string_lossy();
+    // TODO: will need more context!
+    let paths = glob::glob(&pattern)?.collect::<Result<Vec<_>, _>>()?;


Itertools has try_collect for this exact thing ;):

Suggested change

let paths = glob::glob(&pattern)?.collect::<Result<Vec<_>, _>>()?;

let paths = glob::glob(&pattern)?.try_collect()?;

detjensrobert · 2026-02-18T06:20:05Z

src/builder/artifacts.rs

+        .and_then(|p| p.parent())
+        .ok_or_else(|| anyhow!("Project root not found (CWD is too shallow)"))?;
+
+    let resolved: Vec<InputPath> = paths.iter().try_fold(Vec::new(), |mut acc, path| {


Ditto on the fold/map here

detjensrobert · 2026-02-18T06:26:19Z

src/builder/artifacts.rs

+// TODO: verify readability (can stat even if we can't read, duh)
+
+/// For **local** (non-container) asset paths which expands globs
+/// but blocks references that are outside the base config directory.
+///
+/// Recursively descends into folders found as a result of globbing or single specification
+/// and accumulates each result as a separate path.
+//                                                    Ow! my stack!
+///
+/// Folders, files, and symlinks to them are accepted. All symlinks will be
+/// resolved and replaced in returned paths.
+///
+/// ---
+/// Glob-matching is attempted only on paths which do not exist.
+pub fn process_all_paths(chal_dir: &Path, paths: &[PathBuf]) -> Result<Vec<InputPath>> {
+    // anything above this dir would be leaving the directory
+    let canonical_chal_dir = fs::canonicalize(chal_dir)?;
+    let base_cfg_dir = canonical_chal_dir
+        .parent()
+        .and_then(|p| p.parent())
+        .ok_or_else(|| anyhow!("Project root not found (CWD is too shallow)"))?;
+
+    let resolved: Vec<InputPath> = paths.iter().try_fold(Vec::new(), |mut acc, path| {
+        match std::fs::canonicalize(path) {
+            // Assume paths we can resolve aren't globs
+            Ok(p) => {
+                let stat = fs::metadata(&p)?;
+                if stat.is_dir() {
+                    // Recursively run on directory contents, hope we don't explode 
+                    let dentry_paths = fs::read_dir(&p)?.map(|dr| dr.map(|d| d.path())).collect::<Result<Vec<PathBuf>, _>>()?;
+                    acc.extend(process_all_paths(chal_dir, &dentry_paths)?);
+                } else if stat.is_file() {
+                    acc.push(InputPath::fast_validate_single(&base_cfg_dir, &p)?);
+                } else {
+                    bail!("Expected canonicalized asset path to be file or directory but {p:?} is not");
+                }
+                Ok(acc)
+            }
+            // Try paths we can't locate as globs before failing
+            Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
+                let abs_path = std::path::absolute(path)?;
+                let expanded = expand_one_glob(&abs_path)?;
+                acc.extend(process_all_paths(chal_dir, &expanded)?);
+                Ok(acc)
+            }
+            Err(e) => Err(e).with_context(|| format!("failed to canonicalize path {:?}", path)),
+        }
+    })?;
+    Ok(resolved)
+}


It took me a couple readings to fully parse what this was doing -- would it be cleaner to have the input/outputpath helpers return a enum of PathExists/PathFree/Error to represent the three cases here? it feels like we're mixing the path validation in to these different places instead of handling the check in one place.

detjensrobert · 2026-02-18T06:28:33Z

src/builder/artifacts.rs

+// FIXME: this was a quick hack but other semantics might be preferable
+// need to make it easy to expose internal for exernal APIs
+impl AsRef<Path> for InputPath {
+    fn as_ref(&self) -> &Path {
+        &self.0
+    }
+}


This makes sense for how to handle the conversion to me!

detjensrobert · 2026-02-18T06:40:26Z

src/builder/artifacts.rs

+impl InputPath {
+    /// Yield canonical path if the input resolves to one readable file
+    /// located within base repo directory
+    pub fn validate_single(base: &Path, input: &Path) -> Result<Self> {


Since these are making an InputPath from the source, validate_from() might be a better name. validate_single reads as if it is not returning anything and from() is the rust custom for this sort of conversion.

detjensrobert · 2026-02-18T06:49:26Z

What I'm trying this time: New-type wrapper on Paths that we've glob-expanded and validated. TOCTOU is a huge issue if we were to try to store these long-term, but I think our only current workflow is a pretty direct build & upload, which should be fine? Ideally, internal functions would require the new-type while friction with exposing them to external functions like fs::copy would be minimized.

Big question is whether this will be O.K. to slap in with the current way we do validation, or perhaps we could even cut scope if we can't easily think of a cleaner way. I personally like offering a contract with a wrapping type, but I'm not sure if it fits in well enough.

The type enforcement makes sense to me moreso than doing the validation in validate() -- though we could probably confirm the presence of the files then it would be tricky to edit or recreate the struct then with the validated types. I think doing static validation (checking presence of repo files without updating the config at all) fits better with our current style of validation, but using a type wrapper is more idiomatic Rust I think.

initial build-time validation/expansion for in-repo artifacts

e22abcb

using new-type semantics for expanded/validated pahts with a loose contract and exporting to to the upload process

Kajmany requested a review from detjensrobert February 17, 2026 23:54

Kajmany commented Feb 18, 2026

View reviewed changes

detjensrobert requested changes Feb 18, 2026

View reviewed changes

	let paths = glob::glob(&pattern)?.collect::<Result<Vec<_>, _>>()?;
	let paths = glob::glob(&pattern)?.try_collect()?;

Conversation

Kajmany commented Feb 17, 2026

Uh oh!

Kajmany commented Feb 18, 2026

Uh oh!

Kajmany left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

detjensrobert Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

detjensrobert Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

detjensrobert left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

detjensrobert commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

detjensrobert Feb 18, 2026 •

edited

Loading

detjensrobert Feb 18, 2026 •

edited

Loading