Skip to content

Use hash tables for FullName:Param matching in SimpleDiff#107

Merged
clearbluejar merged 3 commits intoclearbluejar:mainfrom
v-p-b:optimize_simple
Feb 17, 2025
Merged

Use hash tables for FullName:Param matching in SimpleDiff#107
clearbluejar merged 3 commits intoclearbluejar:mainfrom
v-p-b:optimize_simple

Conversation

@v-p-b
Copy link
Contributor

@v-p-b v-p-b commented Jan 26, 2025

This way we can reduce the complexity of the matching phase from O(n^2) to O(n). The original inner loop also didn't break if a match was found. The change can be significant in case of large diffs.

This way we can reduce the complexity of the matching phase
from O(n^2) to O(n).
@clearbluejar
Copy link
Owner

Hey! thanks for looking into this. SimpleDiff is the first "engine" I developed, realizing it is very efficient or accurate. I left it in as an example. I don't have any automated tests for it. They all run against the default https://github.com/clearbluejar/ghidriff/blob/main/ghidriff/version_tracking_diff.py .

Let me think of couple of days of what to do. Your changes are a definite improvement, just wonder if I should remove the file completely.

Do you actually use the SimpleDiff Engine? If so, what was your motivation?

@v-p-b
Copy link
Contributor Author

v-p-b commented Jan 30, 2025

I used SimpleDiff for preliminary testing of the tool, and I think it is useful for that purpose, and basic sanity checks. I also had trouble with creating boilerplate for headless Version Tracking, that also played a part in choosing SimpleDiff first, to keep the test as simple as possible. For this reason (assuming the engine stays) it'd be important to avoid similar bugs that could lead to false conclusions about ghidriff overall.

I think having some sample for prototyping new algorithms is also a good a idea.

Overall, I think SimpleDiff has a place in the repo.

@clearbluejar
Copy link
Owner

I like this idea to keep it. Although I realize now that I don't have any tests. Would you be willing to add a test_simple_diff.py in tests and run a test like this? https://github.com/clearbluejar/ghidriff/blob/main/tests/test_diff.py#L32

You would need to change the Diff engine here:
https://github.com/clearbluejar/ghidriff/blob/main/tests/test_diff.py#L91

And update the numbers here to whatever SimpleDiff might generate:

ghidriff/tests/test_diff.py

Lines 126 to 128 in 61e1ffa

assert len(pdiff['functions']['modified']) == 12
assert len(pdiff['functions']['added']) == 28
assert len(pdiff['functions']['deleted']) == 0

Otherwise the test would remain the same.

Let me know if you can. Otherwise I will try to get to it a bit later. Ghidra 11.3 just came out and I'm working those updates.

@v-p-b
Copy link
Contributor Author

v-p-b commented Feb 12, 2025

Sorry, somehow my GitHub notifications don't arrive in my mailbox lately... I'll take a look at the tests and let you know how it goes!

@v-p-b
Copy link
Contributor Author

v-p-b commented Feb 13, 2025

I added the test to this PR, please take a look to see if this is what you wanted! The code is 99% copy-paste, I only had to tweak the modified count.

Btw. most of the diff test code is duplicate, would you be interested in a PR that consolidates boilerplate into functions/fixtures?

@clearbluejar
Copy link
Owner

I'm OK with the test as is for now. Due to the complexity of dealing not only with python runtime, but the JVM. some things get a bit tricky to improve / worry too much about. That being said, let me know if there are any glaring things that could improve. The tests biggest value for me is to ensure that the functionality remains everyime Ghidra upgrades (and every other dependency jpype, pyghidra, etc.).

I'll run the tests now.

@clearbluejar
Copy link
Owner

Looks like the new test passed. Great! For now. Lets just get this merged, happy to discuss further test improvements in future.

@clearbluejar clearbluejar merged commit a4da2ab into clearbluejar:main Feb 17, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants