-
Notifications
You must be signed in to change notification settings - Fork 56
chore: add validation for module source URLs #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index 53b912b..eb3cf8b 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -63,8 +63,8 @@ jobs: - name: Set up Go uses: actions/setup-go@v5 with: - go-version: "1.23.2" - - name: Validate contributors + go-version: "1.25.0" + - name: Validate Reademde run: go build ./cmd/readmevalidation && ./readmevalidation - name: Remove build file artifact run: rm ./readmevalidation Signed-off-by: Muhammad Atif Ali <[email protected]>
98efa49
to
898e773
Compare
- Downgrade Go version in CI to 1.24 for consistency. - Fix naming and path issues in `readmevalidation` code. - Improve regex validation for module and namespace names. - Correct typos and improve comments for clarity.
898e773
to
3b6b1ba
Compare
@@ -63,8 +63,8 @@ jobs: | |||
- name: Set up Go | |||
uses: actions/setup-go@v5 | |||
with: | |||
go-version: "1.23.2" | |||
- name: Validate contributors | |||
go-version: "1.24" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having weird issues with running Golang CI locally, and had to match the Go version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests generated by gemini CLI
@@ -25,7 +25,7 @@ var ( | |||
terraformVersionRe = regexp.MustCompile(`^\s*\bversion\s+=`) | |||
|
|||
// Matches the format "> [!INFO]". Deliberately using a broad pattern to catch formatting issues that can mess up | |||
// the renderer for the Registry website | |||
// the renderer for the Registry website. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these .
are added by running golangci-lint
locally, and I don't know how I can remove them. The CI also passes so I assume its fine leaving them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, I feel like there should be a way to enforce this during the CI step, but this was my fault for not linting before opening one of the recent PRs
errs = append(errs, xerrors.New("registry does not support nested GFM alerts")) | ||
continue | ||
} | ||
|
||
leadingWhitespace := currentMatch[1] | ||
if len(leadingWhitespace) != 1 { | ||
errs = append(errs, errors.New("GFM alerts must have one space between the '>' and the start of the GFM brackets")) | ||
errs = append(errs, xerrors.New("GFM alerts must have one space between the '>' and the start of the GFM brackets")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are also done by golangci-lint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is close. I think there's a couple of changes we can make the code a bit better, though
@@ -25,7 +25,7 @@ var ( | |||
terraformVersionRe = regexp.MustCompile(`^\s*\bversion\s+=`) | |||
|
|||
// Matches the format "> [!INFO]". Deliberately using a broad pattern to catch formatting issues that can mess up | |||
// the renderer for the Registry website | |||
// the renderer for the Registry website. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, I feel like there should be a way to enforce this during the CI step, but this was my fault for not linting before opening one of the recent PRs
cmd/readmevalidation/codermodules.go
Outdated
"strings" | ||
|
||
"golang.org/x/xerrors" | ||
) | ||
|
||
var ( | ||
terraformSourceRe = regexp.MustCompile(`^\s*source\s*=\s*"([^"]+)"`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a good idea for what source fields typically look like? I feel like this regex is too broad (the [^"]+
alone feels like it's too permissive – I don't we should be supporting any arbitrary character that isn't a double quote)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing all of these on the Terraform docs site:
https://developer.hashicorp.com/terraform/language/modules/sources
I think I'd prefer to have more explicit patterns for each source type example listed there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, also, could we add a comment saying what general pattern we expect this to match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last update: now that I've looked at the rest of this file, if we're always going to use registry.coder.com/
in the source, we could bake that into the regex pattern
cmd/readmevalidation/codermodules.go
Outdated
terraformSourceRe = regexp.MustCompile(`^\s*source\s*=\s*"([^"]+)"`) | ||
) | ||
|
||
func extractNamespaceAndModuleFromPath(filePath string) (namespace string, moduleName string, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I know the code works, but I feel like it's a little confusing have to differentiate between the all-lowercase filepath
package, versus the camelcase filePath
parameter. Could we rename the latter to just path
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But going further, I feel like this information will probably end up being useful for other validation logic in the future (for modules or templates). Could we update the logic so that:
- The
coderResourceReadme
type gets updated to includenamespace
andresourceName
fields extractNamespaceAndModuleFromPath
is deleted, and instead has its logic inlined inside ofparseCoderResourceReadme
, which makes sure to add those fields to the struct before returning the value in the success casevalidateModuleSourceURL
is updated to remove thefilePath
parameter, and it instead receives the resource name and namespace from the struct
?
cmd/readmevalidation/codermodules.go
Outdated
return namespace, moduleName, nil | ||
} | ||
|
||
func validateModuleSourceURL(body string, filePath string) []error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish I knew more about Coder templates. Do templates have a source
field (or anything equivalent) that might need to be validated, too?
cmd/readmevalidation/codermodules.go
Outdated
if strings.HasPrefix(nextLine, "```") { | ||
if strings.HasPrefix(nextLine, "```tf") && firstTerraformBlock { | ||
isInsideTerraform = true | ||
firstTerraformBlock = false | ||
} else if isInsideTerraform { | ||
// End of first terraform block. | ||
break | ||
} | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be rewritten as:
if strings.HasPrefix(nextLine, "```") {
if strings.HasPrefix(nextLine, "```tf") {
isInsideTerraform = true
continue
}
if isInsideTerraform {
break
}
}
I don't see the point of the firstTerraformBlock
variable. If we always break at the end of the first Terraform code block, we don't really need to track anything extra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to check all tf blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something obvious, then. What validation requirement(s) are we trying to fulfill?
I was under the impression that we wanted the first Terraform block to contain a source field, but that other sources could be valid for the other blocks. Are we trying to keep all sources fully "boxed in" for Coder templates?
Not saying that the current code is wrong, but I'm not fully sure what we're trying to protect against, so I just want to make sure I'm giving good feedback
if len(errs) != 1 { | ||
t.Errorf("Expected 1 error, got %d: %v", len(errs), errs) | ||
} | ||
if len(errs) > 0 && !contains(errs[0].Error(), "did not find correct source URL") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant condition here, too
t.Run("Module name with hyphens vs underscores", func(t *testing.T) { | ||
t.Parallel() | ||
|
||
body := "# Test Module\n\n```tf\nmodule \"test_module\" {\n source = \"registry.coder.com/test-namespace/test-module/coder\"\n version = \"1.0.0\"\n agent_id = coder_agent.example.id\n}\n```\n" | ||
filePath := "registry/test-namespace/modules/test-module/README.md" | ||
errs := validateModuleSourceURL(body, filePath) | ||
if len(errs) != 0 { | ||
t.Errorf("Expected no errors for hyphen/underscore variation, got: %v", errs) | ||
} | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this test is checking for. I know we previously had concerns around modules having underscores in the slug – is that trying to catch that?
Because if so, I feel like we need to actually have underscores in the source strings
t.Run("Invalid file path format", func(t *testing.T) { | ||
t.Parallel() | ||
|
||
body := "# Test Module" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to introduce a little bit more control into the test, I feel like we should make sure the body is fully valid, though if we apply the changes I mentioned above, I don't know if we need this test anymore
return len(s) >= len(substr) && (s == substr || (len(s) > len(substr) && | ||
(s[:len(substr)] == substr || s[len(s)-len(substr):] == substr || | ||
indexOfSubstring(s, substr) >= 0))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this separate function when we have strings.Contains
?
if s[i:i+len(substr)] == substr { | ||
return i | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This technically works, but re-slicing the string on each iteration feels a bit janky
- Make regex more specific for registry.coder.com patterns only - Refactor to add namespace and resourceName fields to coderResourceReadme struct - Inline path parsing logic into parseCoderResourceReadme - Update validateModuleSourceURL to use struct fields instead of filePath parameter - Simplify Terraform block detection logic - Reduce nesting with early continue statements - Add comment explaining regex pattern - Extract registry.coder.com into a constant - Improve test readability with extracted variables - Remove redundant checks in tests - Replace custom contains function with strings.Contains Co-authored-by: matifali <[email protected]>
Would prevents issues like the one I fixed in #404
This change should have caught that error as