-
-
Couldn't load subscription status.
- Fork 17
Description
I did some initial exploratory work for this in a script ages back; can't remember if it was in chatgpt-source-watch or udio-source-watch repo, and not sure if it ever got to being committed or if it's just somewhere locally still.
The general gist of this issue is that between webpack/similar builds, sometimes the chunk identifiers are renamed, which can mess up our diffing. Often times it's relatively easy to see/guess the renames based on looking at the diffs themselves (eg. in the _buildManifest.js / webpack.js files; but then it's a semi-manual process of renaming these to align so that the diffs look correctly (I believe I wrote some scripts to assist with this at some point also, probably alongside the one mentioned earlier, but similarly may not have been committed anywhere yet).
Similarly, sometimes the chunk identifiers themselves may not have changed, but the module identifiers and/or which chunk they are in may have moved around; causing similar issues with diffing/identifying what is actually new code vs just being moved around, etc.
The idea here is basically to use embeddings / similarity search / etc to compare the chunk files (which is what my initial script does), or the modules within them (which is a more recent idea I had for further enhancements to this) to find the closest match; which then allows us to infer in a programmatic/automated way whether its likely to have been renamed; after which we can handle it appropriately.
See Also
- Deobfuscating / Unminifying Obfuscated Web App / JavaScript Code (0xdevalias' gist)
- Subsection: Fingerprinting Minified JavaScript Libraries
- Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias' gist)
- Integrate wakaru and/or webcrack JS unminimisers (and maybe humanify too) #2