Use hash tables for FullName:Param matching in SimpleDiff#107
Use hash tables for FullName:Param matching in SimpleDiff#107clearbluejar merged 3 commits intoclearbluejar:mainfrom
Conversation
This way we can reduce the complexity of the matching phase from O(n^2) to O(n).
|
Hey! thanks for looking into this. SimpleDiff is the first "engine" I developed, realizing it is very efficient or accurate. I left it in as an example. I don't have any automated tests for it. They all run against the default https://github.com/clearbluejar/ghidriff/blob/main/ghidriff/version_tracking_diff.py . Let me think of couple of days of what to do. Your changes are a definite improvement, just wonder if I should remove the file completely. Do you actually use the SimpleDiff Engine? If so, what was your motivation? |
|
I used SimpleDiff for preliminary testing of the tool, and I think it is useful for that purpose, and basic sanity checks. I also had trouble with creating boilerplate for headless Version Tracking, that also played a part in choosing SimpleDiff first, to keep the test as simple as possible. For this reason (assuming the engine stays) it'd be important to avoid similar bugs that could lead to false conclusions about ghidriff overall. I think having some sample for prototyping new algorithms is also a good a idea. Overall, I think SimpleDiff has a place in the repo. |
|
I like this idea to keep it. Although I realize now that I don't have any tests. Would you be willing to add a test_simple_diff.py in tests and run a test like this? https://github.com/clearbluejar/ghidriff/blob/main/tests/test_diff.py#L32 You would need to change the Diff engine here: And update the numbers here to whatever SimpleDiff might generate: Lines 126 to 128 in 61e1ffa Otherwise the test would remain the same. Let me know if you can. Otherwise I will try to get to it a bit later. Ghidra 11.3 just came out and I'm working those updates. |
|
Sorry, somehow my GitHub notifications don't arrive in my mailbox lately... I'll take a look at the tests and let you know how it goes! |
|
I added the test to this PR, please take a look to see if this is what you wanted! The code is 99% copy-paste, I only had to tweak the modified count. Btw. most of the diff test code is duplicate, would you be interested in a PR that consolidates boilerplate into functions/fixtures? |
|
I'm OK with the test as is for now. Due to the complexity of dealing not only with python runtime, but the JVM. some things get a bit tricky to improve / worry too much about. That being said, let me know if there are any glaring things that could improve. The tests biggest value for me is to ensure that the functionality remains everyime Ghidra upgrades (and every other dependency jpype, pyghidra, etc.). I'll run the tests now. |
|
Looks like the new test passed. Great! For now. Lets just get this merged, happy to discuss further test improvements in future. |
This way we can reduce the complexity of the matching phase from O(n^2) to O(n). The original inner loop also didn't break if a match was found. The change can be significant in case of large diffs.