Skip to content

Conversation

@edoardolincetto
Copy link

Closes #63

This PR adds support for validating file references (logos, images and authorsFile) in publiccode.yml files for generic Git repositories that are not hosted on supported platforms (GitHub, GitLab and Bitbucket).

  • Adds --allow-local-git-clone CLI flag to enable the feature.
  • Adds allowLocalGitClone Parser field to enable the feature.
  • Implements sparse Git cloning for efficient file existence checks.
  • Repositories are cached per-session to avoid redundant clones.
  • Automatic cleanup of temporary cloned repositories.
  • Falls back to standard HTTP checks for URLs when Git cloning fails.

@bfabio
Copy link
Member

bfabio commented Oct 31, 2025

Thanks @edoardolincetto, reviewing ASAP!

@edoardolincetto
Copy link
Author

@bfabio I've also drafted some tests. Do you think testing individual utilities is enough, or should we test the entire process with a dedicated test repository? The testing repo should be hosted outside GitHub, GitLab and Bitbucket.

Copy link
Member

@bfabio bfabio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay @edoardolincetto.

git_helper.go Outdated
}

// Initialize sparse checkout
cmd = exec.Command("git", "sparse-checkout", "init", "--cone")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a dependency on git and I'd rather not. Let's use a library here so it has a chance to work in WASM as well (if we implement storage for the clone someway)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've migrated to go-git to remove the dependency on git. Since go-git doesn't currently support partial clones, the new implementation is slower (especially for large repositories) but it should be fine for typical use cases.

config := publiccode.ParserConfig{BaseURL: *localBasePathPtr}
config.DisableNetwork = *disableNetworkPtr
config.DisableExternalChecks = *disableExternalChecksPtr
config.AllowLocalGitClone = *allowLocalGitClonePtr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this point it makes sense to refactor those options to be more ergonomic.

--no-network, --no-external-checks and this new --allow-local-git-clone are interconnected and they could be used in contradicting ways, eg. --no-external-checks used together with --allow-local-git-clone.

Let's make a new option --external-checks with an argument.

  • --external-checks=none
  • --external-checks=local
  • --external-checks=clone

Care to make a PR for the first two, so that we can implement --external-checks=clone afterwards?

// fileExists returns true if the file resource exists.
func (p *Parser) fileExists(u url.URL, network bool) (bool, error) {
// Returns true if the file resource exists.
func (p *Parser) fileExists(u url.URL, network bool, isGitRepo bool) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the signature of this function, but I suppose that with the --external-checks refactor we'd consolidate the two bools into a enum.

Something like `fileExists(url url.URL, checkMode CheckMode).

git_helper.go Outdated
}

// Checks if a file exists in a Git repository by attempting to check it out.
func (g *gitHelper) fileExistsInRepo(repoURL string, filePath string) (bool, string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also handle the case where the publiccode.yml is already in a cloned repo.

git_helper.go Outdated
// Checks if an URL is a generic Git repository URL.
// Returns false for supported hosting platforms (GitHub, GitLab, Bitbucket)
// which have web interfaces and should not use local Git cloning.
func isGitURL(u *url.URL) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting on this function, but it's a general remark.

isGitURL, isGitRepo, are misleading. We should pick an alternative way of expressing the concept: They're always git repos in all cases, and it might be confusing to follow the code with this naming.

git_helper.go Outdated
return false, "", err
}

// Try to checkout the file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of comments like this one. Was this PR AI assisted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider supporting git repos with no code hosting provider

2 participants