Added Independent Finding Execution Behavior #1030

andrecsilva · 2025-03-24T12:00:20Z

Overview

Adds independent finding behavior for codemods

Description

Independent finding behavior means that codemods will execute once for each finding (i.e. remediation), as opposed to once for all findings (i.e. hardening). Hardening aims to produce a file with all the changes, while remediation will produce diffs for each finding. Remediation is the default behavior accessible with the codemodder command, while hardening is now accessible as codemodder_hardening.

Additional Details

Remediation won't write any files as the default behavior, this includes the dependencies.
While find and fix codemods will still be accessible over the remediation behavior, there is no difference over the hardening behavior. The only exception being codemods that use semgrep scans for findings.
Most tests with 2+ changes are now being tested for remediation behavior. I've found this gives us a healthy balance of testing for both behaviors while avoiding duplicating each test.

drdavella

Overall this looks good. My comments boil down to two basic issues:

We should preserve the current hardening behavior as the default for now
I'm not totally convinced that locations are being handled correctly for the hardening case

drdavella · 2025-03-25T16:45:22Z

src/codemodder/codemodder.py

    sast_only: bool = False,
    ai_client: bool = True,
    log_matched_files: bool = False,
+    hardening: bool = False,


Suggested change

hardening: bool = False,

hardening: bool = True,

drdavella · 2025-03-26T19:42:19Z

.github/workflows/codemod_pygoat.yml

          path: pygoat
      - name: Run Codemodder
-        run: codemodder --dry-run --output output.codetf pygoat
+        run: codemodder_hardening --dry-run --output output.codetf pygoat


I think we should preserve the existing behavior of the command line for now.

drdavella · 2025-03-26T19:42:50Z

integration_tests/test_dependency_manager.py


        command = [
-            "codemodder",
+            "codemodder_hardening",


Let's keep the existing behavior for now.

drdavella · 2025-03-26T19:43:17Z

integration_tests/test_multiple_codemods.py


        command = [
-            "codemodder",
+            "codemodder_hardening",


Preserve existing behavior for now.

drdavella · 2025-03-26T19:44:35Z

pyproject.toml


 [project.scripts]
 codemodder = "codemodder.codemodder:main"
+codemodder_hardening = "codemodder.codemodder:harden"


I don't think this is necessary. We expect per-finding mode to be accessed exclusively by client library calls for now and we should preserve the existing codemodder behavior as the default in the near term.

We do use the new command line script for integration tests.

drdavella · 2025-03-26T19:45:41Z

src/codemodder/codemodder.py



-def _run_cli(original_args) -> int:
+def _run_cli(original_args, hardening=False) -> int:


Suggested change

def _run_cli(original_args, hardening=False) -> int:

def _run_cli(original_args, hardening=True) -> int:

drdavella · 2025-03-26T19:45:59Z

src/codemodder/codemodder.py

    return status


+def harden():


I don't think this is necessary; see comments above.

drdavella · 2025-03-26T19:46:40Z

src/codemodder/codemods/base_codemod.py

-            contexts.extend([process_file(file) for file in files_to_analyze])
-        else:
+        # Do each result independently and outputs the diffs
+        if not hardening:


This if/else feels like each branch should maybe be a separate function. It would probably enable the call to be a one-liner as well.

drdavella · 2025-03-26T20:16:21Z

src/codemodder/codemods/base_codemod.py

+                    singleton = results.__class__()
+                    singleton.add_result(result)
+                    result_locations = self.get_files_to_analyze(context, singleton)
+                    # We do an execution for each location in the result


I don't think this is correct. We only need to execute each fix per finding. A finding may report multiple locations, but will still result in only a single fix (which itself may touch multiple locations). I think the distinction is subtle but we still execute each codemod only once per finding.

Let me know whether there's something I'm missing or whether maybe the comment isn't quite right.

Two things to note: our transformers are file-based, that is, they execute per file. ChangeSet objects only reports changes to a single file.

This particular piece of code mimics the behavior we had before. If you only had one result with multiple locations, you would run the codemod for each location and produce a changeset object for each location.

What changed is that now each changeset is only associated with a single finding.

clavedeluna

a few code comments that seem unnecessarily explaining code (LLM generated likely) that are worth removing

clavedeluna · 2025-03-26T16:20:10Z

pyproject.toml


 [project.scripts]
 codemodder = "codemodder.codemodder:main"
+codemodder_hardening = "codemodder.codemodder:harden"


for consistency with other commands, I suggest codemodder-hardening

clavedeluna · 2025-03-26T22:54:02Z

src/codemodder/codemods/test/utils.py


        path_exclude = [f"{tmp_file_path}:{line}" for line in lines_to_exclude or []]

+        print(expected_diff_per_change)


Suggested change

print(expected_diff_per_change)

drdavella

Approved and looks good but please consider making the change to clean up creation of singleton result sets.

drdavella · 2025-04-11T18:12:29Z

src/codemodder/codemods/base_codemod.py

+        if results:
+            for result in results.results_for_rules(rules):
+                # this need to be the same type of ResultSet as results
+                singleton = results.__class__()


Sorry for such late feedback but it feels like this could be done more cleanly by implementing a class method on the ResultSet type:

T = TypeVar("T", bound=Result) # we may already have this somewhere @classmethod def from_single_result(cls, result: T) -> Self: new = cls() cls.add_result(result) return new

Alternatively there may be a way to make it a method of Result instead, I'm not sure which is cleaner. In either case it avoids the introspection and feels like better encapsulation.

sonarqubecloud · 2025-04-14T11:23:32Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
2.0% Duplication on New Code

See analysis details on SonarQube Cloud

andrecsilva requested review from clavedeluna and drdavella as code owners March 24, 2025 12:00

andrecsilva marked this pull request as draft March 24, 2025 12:00

andrecsilva marked this pull request as ready for review March 25, 2025 12:19

drdavella requested changes Mar 26, 2025

View reviewed changes

clavedeluna reviewed Mar 26, 2025

View reviewed changes

andrecsilva force-pushed the independent-execution branch from e943f3c to 5c37c17 Compare March 28, 2025 13:25

andrecsilva requested review from clavedeluna and drdavella April 2, 2025 16:26

andrecsilva force-pushed the independent-execution branch 3 times, most recently from 0ff1b1d to 7585715 Compare April 9, 2025 11:06

drdavella approved these changes Apr 11, 2025

View reviewed changes

andrecsilva added 17 commits April 14, 2025 07:59

Codemods will now execute once for each finding independently

d7d0c9a

Adjusted tests to account for individual changes diff

874743b

Fixed a few more unit tests

66fd2f3

Fixed a few more tests

9b3e827

Fixed more unit tests

abe872a

Fixed more unit tests

141cde2

Added hardening script

1aa72fd

Added integration tests for new behavior

a8e59a2

Added file rewritten check

64aef85

Fixed some integration tests

222e649

More integration tests converted

3df2907

Fixed a few more tests

22a55bb

Remove leftover debugging code

e0d2b81

Fixed broken dependency and some docstrings

f5bdf33

Fixed pygoat workflow file

139926d

Fixed some intermittent tests

22dd701

Some refactoring

233974a

andrecsilva added 10 commits April 14, 2025 07:59

More refactoring

301379b

Small refactoring

ae4b6b5

Reverted default behavior to hardening

1bf27c6

Fixed pygoat test

2ab7c0b

Refactored hardening and remediation behavior

04f5a99

Fixed skipping logic

e994695

Fixed asserts in integration tests with sonar issues

0582137

Removed debugging code

18cd5e7

Downgraded pydantic version and bumped sarif-pydantic

63e302a

Added method to create ResultSets from the same type

bb95515

andrecsilva force-pushed the independent-execution branch from 7585715 to bb95515 Compare April 14, 2025 11:22

andrecsilva added this pull request to the merge queue Apr 14, 2025

Merged via the queue into main with commit 3dde585 Apr 14, 2025
14 checks passed

andrecsilva deleted the independent-execution branch April 14, 2025 11:40



		def _run_cli(original_args) -> int:
		def _run_cli(original_args, hardening=False) -> int:

	def _run_cli(original_args, hardening=False) -> int:
	def _run_cli(original_args, hardening=True) -> int:


		path_exclude = [f"{tmp_file_path}:{line}" for line in lines_to_exclude or []]

		print(expected_diff_per_change)

Added Independent Finding Execution Behavior #1030

Added Independent Finding Execution Behavior #1030

Uh oh!

Conversation

andrecsilva commented Mar 24, 2025

Overview

Description

Additional Details

Uh oh!

drdavella left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clavedeluna left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drdavella left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Apr 14, 2025

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants