ai fixer precommit#5320
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant new feature to the Cluster Toolkit: an AI-powered pre-commit fixer. This tool aims to streamline the development workflow by automatically addressing common pre-commit failures, such as linting errors or formatting issues, before code is pushed. It integrates with Google Cloud's Vertex AI to generate and apply fixes, reducing manual intervention and accelerating the commit process. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a promising new feature, the AI-powered pre-commit fixer, with a well-structured implementation and integrated command-line interface. However, it has significant security vulnerabilities related to how it handles untrusted input from pre-commit hooks and file contents. The most critical issue is a path traversal vulnerability that allows arbitrary file read and write on the user's system, and prompt construction is vulnerable to prompt injection. Beyond security, improvements are needed for robustness, testability, and maintainability, including correcting unit test assertions, improving API response parsing safety, resolving configuration inconsistencies, and adding unit tests for new packages.
| } | ||
|
|
||
| func (f *Fixer) fixFailure(failure Failure) error { | ||
| content, err := os.ReadFile(failure.File) |
There was a problem hiding this comment.
A critical path traversal vulnerability exists: the failure.File path, parsed from untrusted pre-commit output in pkg/ai/parser.go, is used directly in os.ReadFile without validation, allowing arbitrary file reads. This fixer package also lacks unit tests, which are essential for maintainability and correctness, as per the project's style guide (Rule 7). Please address the path validation and add comprehensive unit tests for fixer.go and client.go, mocking external dependencies.
| var failures []Failure | ||
| lines := strings.Split(output, "\n") | ||
| modified := false | ||
| commonErrorRegex := regexp.MustCompile(`^([^:\s]+):(\d+):(?:(\d+):)?\s*(.*)$`) |
There was a problem hiding this comment.
The regular expression used to parse file paths from pre-commit output is too permissive. It matches any sequence of characters that are not colons or spaces, which can include absolute paths (e.g., /etc/passwd) or paths with directory traversal sequences (e.g., ../../etc/passwd). If a malicious or compromised pre-commit hook (or a hook whose output is influenced by untrusted data) outputs a line matching this pattern, it can trick the tool into reading and overwriting arbitrary files on the user's system in subsequent steps.
| return fmt.Errorf("AI generation failed: %w", err) | ||
| } | ||
|
|
||
| if err := os.WriteFile(failure.File, []byte(fixedContent), 0644); err != nil { |
There was a problem hiding this comment.
The failure.File path, which is parsed from untrusted pre-commit output in pkg/ai/parser.go, is used directly in os.WriteFile without validation. This allows an attacker to overwrite arbitrary files on the user's system with AI-generated content if they can influence the output of a pre-commit hook. Combined with the arbitrary file read in os.ReadFile, this is a critical security risk.
| expected := []Failure{ | ||
| { | ||
| File: "pkg/shell/terraform.go", | ||
| Line: 0, // regex doesn't parse line number to int yet, but string check | ||
| Message: "[GolangCI Lint] ineffectual assignment to err (ineffassign)", | ||
| Hook: "GolangCI Lint", | ||
| }, | ||
| { | ||
| File: "pkg/shell/terraform_test.go", | ||
| Line: 0, | ||
| Message: "[GolangCI Lint] unknown field (typecheck)", | ||
| Hook: "GolangCI Lint", | ||
| }, | ||
| } |
There was a problem hiding this comment.
The test TestParseFailures incorrectly asserts that the parsed line number is 0. The parser implementation correctly uses fmt.Sscanf to extract the line numbers 23 and 10 from the test data. The test assertions should be updated to reflect the correct expected values. An incorrect test provides a false sense of security and makes future refactoring difficult.
| expected := []Failure{ | |
| { | |
| File: "pkg/shell/terraform.go", | |
| Line: 0, // regex doesn't parse line number to int yet, but string check | |
| Message: "[GolangCI Lint] ineffectual assignment to err (ineffassign)", | |
| Hook: "GolangCI Lint", | |
| }, | |
| { | |
| File: "pkg/shell/terraform_test.go", | |
| Line: 0, | |
| Message: "[GolangCI Lint] unknown field (typecheck)", | |
| Hook: "GolangCI Lint", | |
| }, | |
| } | |
| expected := []Failure{ | |
| { | |
| File: "pkg/shell/terraform.go", | |
| Line: 23, | |
| Message: "[GolangCI Lint] ineffectual assignment to err (ineffassign)", | |
| Hook: "GolangCI Lint", | |
| }, | |
| { | |
| File: "pkg/shell/terraform_test.go", | |
| Line: 10, | |
| Message: "[GolangCI Lint] unknown field (typecheck)", | |
| Hook: "GolangCI Lint", | |
| }, | |
| } |
| prompt := fmt.Sprintf(`You are an expert software engineer. | ||
| The following file failed pre-commit hook '%s'. | ||
| Error message: '%s'. | ||
| File content: | ||
| %s | ||
|
|
||
| Please provide the corrected file content. Do not provide any markdown formatting, just the raw code. | ||
| If the file is a Go file, ensure it compiles and follows gofmt. | ||
| If the file is a Terraform file, ensure it follows terraform fmt. | ||
| Focus your fix on line %d and its immediate context. PRESERVE all other content exactly as is. | ||
| Return ONLY the full file content. Do NOT truncate. Do NOT use placeholders.`, failure.Hook, failure.Message, content, failure.Line) |
There was a problem hiding this comment.
The prompt for the AI model is constructed by directly concatenating untrusted data, including the file content (content) and the error message (failure.Message). This makes the tool vulnerable to prompt injection. An attacker could craft a file or a hook error message that contains instructions to the AI, potentially causing it to generate malicious code or exfiltrate sensitive information. Since the AI's output is then written back to the file system, this could lead to code execution or system compromise.
| fixPreCommitsCmd.Flags().IntVar(&maxRetries, "max-retries", 3, "Maximum number of retries per file") | ||
| fixPreCommitsCmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "Enable verbose output") | ||
| fixPreCommitsCmd.Flags().StringVar(®ion, "region", "us-central1", "Vertex AI region") | ||
| fixPreCommitsCmd.Flags().StringVar(&model, "model", "gemini-2.0-flash-001", "Vertex AI model") |
There was a problem hiding this comment.
The default model gemini-2.0-flash-001 is defined here and also in pkg/ai/client.go. Additionally, pkg/ai/client.go contains an unused constant defaultModel = "gemini-1.5-pro". To improve maintainability and establish a single source of truth, these defaults should be consolidated. Consider defining the default in one place (e.g., a constant in the ai package) and referencing it from the command.
| func parseVertexResponse(body io.Reader) (string, error) { | ||
| var parsedResp map[string]interface{} | ||
| if err := json.NewDecoder(body).Decode(&parsedResp); err != nil { | ||
| return "", err | ||
| } | ||
|
|
||
| candidates, ok := parsedResp["candidates"].([]interface{}) | ||
| if !ok || len(candidates) == 0 { | ||
| return "", fmt.Errorf("no candidates returned from AI") | ||
| } | ||
|
|
||
| candidate := candidates[0].(map[string]interface{}) | ||
| contentParts, ok := candidate["content"].(map[string]interface{}) | ||
| if !ok { | ||
| return "", fmt.Errorf("unexpected response structure") | ||
| } | ||
|
|
||
| parts, ok := contentParts["parts"].([]interface{}) | ||
| if !ok || len(parts) == 0 { | ||
| return "", fmt.Errorf("no content parts returned") | ||
| } | ||
|
|
||
| textPart, ok := parts[0].(map[string]interface{}) | ||
| if !ok { | ||
| return "", fmt.Errorf("unexpected part structure") | ||
| } | ||
|
|
||
| text, ok := textPart["text"].(string) | ||
| if !ok { | ||
| return "", fmt.Errorf("text not found in response") | ||
| } | ||
|
|
||
| return text, nil | ||
| } |
There was a problem hiding this comment.
The function parseVertexResponse uses a series of type assertions on map[string]interface{} to parse the JSON response from the Vertex AI API. This approach is brittle and can lead to runtime panics if the API response structure changes.
Following Go best practices, it would be safer and cleaner to define structs that model the expected JSON response and unmarshal the data directly into them. This provides type safety and makes the code easier to read and maintain.
Example:
// Define structs that match the expected JSON structure
type VertexAIResponse struct {
Candidates []Candidate `json:"candidates"`
}
type Candidate struct {
Content Content `json:"content"`
}
// ... and so on
// Then, in parseVertexResponse:
var resp VertexAIResponse
if err := json.NewDecoder(body).Decode(&resp); err != nil {
return "", err
}
// Safely access data, e.g., resp.Candidates[0].Content.Parts[0].TextThere was a problem hiding this comment.
@kvenkatachala333 Are there plans to add documentation to the Cloud documentation (https://docs.cloud.google.com/cluster-toolkit/docs) for this feature? Thanks!
CC: @bytetwin
The AI Pre-commit Fixer is a proposed tool within the cluster-toolkit (likely via gcluster) that automatically identifies, diagnoses, and resolves pre-commit failures. Instead of developers manually fixing linting errors, formatting issues, or simple logic bugs, this tool acts as an autonomous agent to clear the "pre-commit debt" before code push.