Skip to content

Explore using embeddings/similar to identify/track similar chunks/modules even when renamed #15

@0xdevalias

Description

@0xdevalias

I did some initial exploratory work for this in a script ages back; can't remember if it was in chatgpt-source-watch or udio-source-watch repo, and not sure if it ever got to being committed or if it's just somewhere locally still.

The general gist of this issue is that between webpack/similar builds, sometimes the chunk identifiers are renamed, which can mess up our diffing. Often times it's relatively easy to see/guess the renames based on looking at the diffs themselves (eg. in the _buildManifest.js / webpack.js files; but then it's a semi-manual process of renaming these to align so that the diffs look correctly (I believe I wrote some scripts to assist with this at some point also, probably alongside the one mentioned earlier, but similarly may not have been committed anywhere yet).

Similarly, sometimes the chunk identifiers themselves may not have changed, but the module identifiers and/or which chunk they are in may have moved around; causing similar issues with diffing/identifying what is actually new code vs just being moved around, etc.

The idea here is basically to use embeddings / similarity search / etc to compare the chunk files (which is what my initial script does), or the modules within them (which is a more recent idea I had for further enhancements to this) to find the closest match; which then allows us to infer in a programmatic/automated way whether its likely to have been renamed; after which we can handle it appropriately.

See Also

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions