Skip to content

ai fixer precommit#5320

Draft
kvenkatachala333 wants to merge 1 commit intoGoogleCloudPlatform:developfrom
kvenkatachala333:ai_precommit
Draft

ai fixer precommit#5320
kvenkatachala333 wants to merge 1 commit intoGoogleCloudPlatform:developfrom
kvenkatachala333:ai_precommit

Conversation

@kvenkatachala333
Copy link
Member

The AI Pre-commit Fixer is a proposed tool within the cluster-toolkit (likely via gcluster) that automatically identifies, diagnoses, and resolves pre-commit failures. Instead of developers manually fixing linting errors, formatting issues, or simple logic bugs, this tool acts as an autonomous agent to clear the "pre-commit debt" before code push.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new feature to the Cluster Toolkit: an AI-powered pre-commit fixer. This tool aims to streamline the development workflow by automatically addressing common pre-commit failures, such as linting errors or formatting issues, before code is pushed. It integrates with Google Cloud's Vertex AI to generate and apply fixes, reducing manual intervention and accelerating the commit process.

Highlights

  • New AI Command Group: A new top-level ai command group has been introduced to the gcluster CLI, serving as a container for AI-powered tools.
  • AI Pre-commit Fixer: A fix-pre-commits subcommand has been added under the ai group, designed to automatically identify and resolve pre-commit failures using Google Cloud's Vertex AI.
  • Vertex AI Integration: Core logic for interacting with Vertex AI has been implemented, including authentication, prompt generation, and parsing AI responses to apply code fixes.
  • Pre-commit Output Parsing: A robust parser was developed to interpret the output of pre-commit hooks, extracting specific failure details like file paths, line numbers, and error messages.
  • Comprehensive Documentation: Detailed documentation (docs/ai_pre_commit_usage.md) has been added, covering prerequisites, installation, usage examples, customization options, and troubleshooting for the new AI pre-commit fixer.
Changelog
  • cmd/ai.go
    • Added new ai command group to the gcluster CLI.
  • cmd/fix_pre_commits.go
    • Implemented the fix-pre-commits subcommand under the ai command.
    • Integrated the AI fixer logic, allowing automatic resolution of pre-commit failures.
    • Defined command-line flags for max-retries, verbose, region, and model.
  • docs/ai_pre_commit_usage.md
    • Added new documentation detailing the usage, prerequisites, installation, and troubleshooting for the AI-powered pre-commit fixer.
  • pkg/ai/client.go
    • Created an AI client for interacting with Google Cloud's Vertex AI.
    • Implemented methods for generating code fixes based on pre-commit failures.
    • Included logic for gcloud authentication and project ID retrieval.
  • pkg/ai/fixer.go
    • Developed the core Fixer logic to orchestrate running pre-commit hooks and applying AI-generated fixes.
    • Included retry mechanisms and verbose logging options.
  • pkg/ai/parser.go
    • Implemented a parser to extract detailed failure information (file, line, message, hook) from pre-commit hook output.
  • pkg/ai/parser_test.go
    • Added unit tests to validate the functionality of the pre-commit output parser.
Activity
  • No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a promising new feature, the AI-powered pre-commit fixer, with a well-structured implementation and integrated command-line interface. However, it has significant security vulnerabilities related to how it handles untrusted input from pre-commit hooks and file contents. The most critical issue is a path traversal vulnerability that allows arbitrary file read and write on the user's system, and prompt construction is vulnerable to prompt injection. Beyond security, improvements are needed for robustness, testability, and maintainability, including correcting unit test assertions, improving API response parsing safety, resolving configuration inconsistencies, and adding unit tests for new packages.

}

func (f *Fixer) fixFailure(failure Failure) error {
content, err := os.ReadFile(failure.File)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

A critical path traversal vulnerability exists: the failure.File path, parsed from untrusted pre-commit output in pkg/ai/parser.go, is used directly in os.ReadFile without validation, allowing arbitrary file reads. This fixer package also lacks unit tests, which are essential for maintainability and correctness, as per the project's style guide (Rule 7). Please address the path validation and add comprehensive unit tests for fixer.go and client.go, mocking external dependencies.

var failures []Failure
lines := strings.Split(output, "\n")
modified := false
commonErrorRegex := regexp.MustCompile(`^([^:\s]+):(\d+):(?:(\d+):)?\s*(.*)$`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The regular expression used to parse file paths from pre-commit output is too permissive. It matches any sequence of characters that are not colons or spaces, which can include absolute paths (e.g., /etc/passwd) or paths with directory traversal sequences (e.g., ../../etc/passwd). If a malicious or compromised pre-commit hook (or a hook whose output is influenced by untrusted data) outputs a line matching this pattern, it can trick the tool into reading and overwriting arbitrary files on the user's system in subsequent steps.

return fmt.Errorf("AI generation failed: %w", err)
}

if err := os.WriteFile(failure.File, []byte(fixedContent), 0644); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The failure.File path, which is parsed from untrusted pre-commit output in pkg/ai/parser.go, is used directly in os.WriteFile without validation. This allows an attacker to overwrite arbitrary files on the user's system with AI-generated content if they can influence the output of a pre-commit hook. Combined with the arbitrary file read in os.ReadFile, this is a critical security risk.

Comment on lines +42 to +55
expected := []Failure{
{
File: "pkg/shell/terraform.go",
Line: 0, // regex doesn't parse line number to int yet, but string check
Message: "[GolangCI Lint] ineffectual assignment to err (ineffassign)",
Hook: "GolangCI Lint",
},
{
File: "pkg/shell/terraform_test.go",
Line: 0,
Message: "[GolangCI Lint] unknown field (typecheck)",
Hook: "GolangCI Lint",
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The test TestParseFailures incorrectly asserts that the parsed line number is 0. The parser implementation correctly uses fmt.Sscanf to extract the line numbers 23 and 10 from the test data. The test assertions should be updated to reflect the correct expected values. An incorrect test provides a false sense of security and makes future refactoring difficult.

Suggested change
expected := []Failure{
{
File: "pkg/shell/terraform.go",
Line: 0, // regex doesn't parse line number to int yet, but string check
Message: "[GolangCI Lint] ineffectual assignment to err (ineffassign)",
Hook: "GolangCI Lint",
},
{
File: "pkg/shell/terraform_test.go",
Line: 0,
Message: "[GolangCI Lint] unknown field (typecheck)",
Hook: "GolangCI Lint",
},
}
expected := []Failure{
{
File: "pkg/shell/terraform.go",
Line: 23,
Message: "[GolangCI Lint] ineffectual assignment to err (ineffassign)",
Hook: "GolangCI Lint",
},
{
File: "pkg/shell/terraform_test.go",
Line: 10,
Message: "[GolangCI Lint] unknown field (typecheck)",
Hook: "GolangCI Lint",
},
}

Comment on lines +67 to +77
prompt := fmt.Sprintf(`You are an expert software engineer.
The following file failed pre-commit hook '%s'.
Error message: '%s'.
File content:
%s

Please provide the corrected file content. Do not provide any markdown formatting, just the raw code.
If the file is a Go file, ensure it compiles and follows gofmt.
If the file is a Terraform file, ensure it follows terraform fmt.
Focus your fix on line %d and its immediate context. PRESERVE all other content exactly as is.
Return ONLY the full file content. Do NOT truncate. Do NOT use placeholders.`, failure.Hook, failure.Message, content, failure.Line)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The prompt for the AI model is constructed by directly concatenating untrusted data, including the file content (content) and the error message (failure.Message). This makes the tool vulnerable to prompt injection. An attacker could craft a file or a hook error message that contains instructions to the AI, potentially causing it to generate malicious code or exfiltrate sensitive information. Since the AI's output is then written back to the file system, this could lead to code execution or system compromise.

fixPreCommitsCmd.Flags().IntVar(&maxRetries, "max-retries", 3, "Maximum number of retries per file")
fixPreCommitsCmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "Enable verbose output")
fixPreCommitsCmd.Flags().StringVar(&region, "region", "us-central1", "Vertex AI region")
fixPreCommitsCmd.Flags().StringVar(&model, "model", "gemini-2.0-flash-001", "Vertex AI model")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The default model gemini-2.0-flash-001 is defined here and also in pkg/ai/client.go. Additionally, pkg/ai/client.go contains an unused constant defaultModel = "gemini-1.5-pro". To improve maintainability and establish a single source of truth, these defaults should be consolidated. Consider defining the default in one place (e.g., a constant in the ai package) and referencing it from the command.

Comment on lines +162 to +195
func parseVertexResponse(body io.Reader) (string, error) {
var parsedResp map[string]interface{}
if err := json.NewDecoder(body).Decode(&parsedResp); err != nil {
return "", err
}

candidates, ok := parsedResp["candidates"].([]interface{})
if !ok || len(candidates) == 0 {
return "", fmt.Errorf("no candidates returned from AI")
}

candidate := candidates[0].(map[string]interface{})
contentParts, ok := candidate["content"].(map[string]interface{})
if !ok {
return "", fmt.Errorf("unexpected response structure")
}

parts, ok := contentParts["parts"].([]interface{})
if !ok || len(parts) == 0 {
return "", fmt.Errorf("no content parts returned")
}

textPart, ok := parts[0].(map[string]interface{})
if !ok {
return "", fmt.Errorf("unexpected part structure")
}

text, ok := textPart["text"].(string)
if !ok {
return "", fmt.Errorf("text not found in response")
}

return text, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function parseVertexResponse uses a series of type assertions on map[string]interface{} to parse the JSON response from the Vertex AI API. This approach is brittle and can lead to runtime panics if the API response structure changes.

Following Go best practices, it would be safer and cleaner to define structs that model the expected JSON response and unmarshal the data directly into them. This provides type safety and makes the code easier to read and maintain.

Example:

// Define structs that match the expected JSON structure
type VertexAIResponse struct {
    Candidates []Candidate `json:"candidates"`
}
type Candidate struct {
    Content Content `json:"content"`
}
// ... and so on

// Then, in parseVertexResponse:
var resp VertexAIResponse
if err := json.NewDecoder(body).Decode(&resp); err != nil {
    return "", err
}
// Safely access data, e.g., resp.Candidates[0].Content.Parts[0].Text

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kvenkatachala333 Are there plans to add documentation to the Cloud documentation (https://docs.cloud.google.com/cluster-toolkit/docs) for this feature? Thanks!

CC: @bytetwin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants