Skip to content

Conversation

@andrecsilva
Copy link
Contributor

Overview

Adds independent finding behavior for codemods

Description

Independent finding behavior means that codemods will execute once for each finding (i.e. remediation), as opposed to once for all findings (i.e. hardening). Hardening aims to produce a file with all the changes, while remediation will produce diffs for each finding. Remediation is the default behavior accessible with the codemodder command, while hardening is now accessible as codemodder_hardening.

Additional Details

  • Remediation won't write any files as the default behavior, this includes the dependencies.
  • While find and fix codemods will still be accessible over the remediation behavior, there is no difference over the hardening behavior. The only exception being codemods that use semgrep scans for findings.
  • Most tests with 2+ changes are now being tested for remediation behavior. I've found this gives us a healthy balance of testing for both behaviors while avoiding duplicating each test.

@andrecsilva andrecsilva marked this pull request as draft March 24, 2025 12:00
@andrecsilva andrecsilva marked this pull request as ready for review March 25, 2025 12:19
Copy link
Member

@drdavella drdavella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. My comments boil down to two basic issues:

  • We should preserve the current hardening behavior as the default for now
  • I'm not totally convinced that locations are being handled correctly for the hardening case

sast_only: bool = False,
ai_client: bool = True,
log_matched_files: bool = False,
hardening: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hardening: bool = False,
hardening: bool = True,

path: pygoat
- name: Run Codemodder
run: codemodder --dry-run --output output.codetf pygoat
run: codemodder_hardening --dry-run --output output.codetf pygoat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should preserve the existing behavior of the command line for now.


command = [
"codemodder",
"codemodder_hardening",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the existing behavior for now.


command = [
"codemodder",
"codemodder_hardening",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserve existing behavior for now.

pyproject.toml Outdated

[project.scripts]
codemodder = "codemodder.codemodder:main"
codemodder_hardening = "codemodder.codemodder:harden"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary. We expect per-finding mode to be accessed exclusively by client library calls for now and we should preserve the existing codemodder behavior as the default in the near term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do use the new command line script for integration tests.



def _run_cli(original_args) -> int:
def _run_cli(original_args, hardening=False) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _run_cli(original_args, hardening=False) -> int:
def _run_cli(original_args, hardening=True) -> int:

return status


def harden():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary; see comments above.

contexts.extend([process_file(file) for file in files_to_analyze])
else:
# Do each result independently and outputs the diffs
if not hardening:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if/else feels like each branch should maybe be a separate function. It would probably enable the call to be a one-liner as well.

singleton = results.__class__()
singleton.add_result(result)
result_locations = self.get_files_to_analyze(context, singleton)
# We do an execution for each location in the result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. We only need to execute each fix per finding. A finding may report multiple locations, but will still result in only a single fix (which itself may touch multiple locations). I think the distinction is subtle but we still execute each codemod only once per finding.

Let me know whether there's something I'm missing or whether maybe the comment isn't quite right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things to note: our transformers are file-based, that is, they execute per file. ChangeSet objects only reports changes to a single file.

This particular piece of code mimics the behavior we had before. If you only had one result with multiple locations, you would run the codemod for each location and produce a changeset object for each location.

What changed is that now each changeset is only associated with a single finding.

Copy link
Contributor

@clavedeluna clavedeluna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few code comments that seem unnecessarily explaining code (LLM generated likely) that are worth removing

pyproject.toml Outdated

[project.scripts]
codemodder = "codemodder.codemodder:main"
codemodder_hardening = "codemodder.codemodder:harden"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency with other commands, I suggest codemodder-hardening


path_exclude = [f"{tmp_file_path}:{line}" for line in lines_to_exclude or []]

print(expected_diff_per_change)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(expected_diff_per_change)

@andrecsilva andrecsilva force-pushed the independent-execution branch from e943f3c to 5c37c17 Compare March 28, 2025 13:25
@andrecsilva andrecsilva force-pushed the independent-execution branch 3 times, most recently from 0ff1b1d to 7585715 Compare April 9, 2025 11:06
Copy link
Member

@drdavella drdavella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved and looks good but please consider making the change to clean up creation of singleton result sets.

if results:
for result in results.results_for_rules(rules):
# this need to be the same type of ResultSet as results
singleton = results.__class__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for such late feedback but it feels like this could be done more cleanly by implementing a class method on the ResultSet type:

T = TypeVar("T", bound=Result) # we may already have this somewhere
@classmethod
def from_single_result(cls, result: T) -> Self:
    new = cls()
    cls.add_result(result)
    return new

Alternatively there may be a way to make it a method of Result instead, I'm not sure which is cleaner. In either case it avoids the introspection and feels like better encapsulation.

@andrecsilva andrecsilva force-pushed the independent-execution branch from 7585715 to bb95515 Compare April 14, 2025 11:22
@sonarqubecloud
Copy link

@andrecsilva andrecsilva added this pull request to the merge queue Apr 14, 2025
Merged via the queue into main with commit 3dde585 Apr 14, 2025
14 checks passed
@andrecsilva andrecsilva deleted the independent-execution branch April 14, 2025 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants