Simple diff analysis based on strings and identifiers

This is an idea for a type of analysis of code diffs. This issue is just for tracking notes and ideas

## Example input 1
```diff
+            guidance: function (e) {
+              var t = e.isSearchable,
+                n = e.isMulti,
+                r = e.isDisabled,
+                i = e.tabSelectsValue;
+              switch (e.context) {
+                case "menu":
+                  return "Use Up and Down to choose options"
+                    .concat(
+                      r
+                        ? ""
+                        : ", press Enter to select the currently focused option",
+                      ", press Escape to exit the menu",
+                    )
+                    .concat(
+                      i
+                        ? ", press Tab to select the option and exit the menu"
+                        : "",
+                      ".",
+                    );
```
Here the extraction might be
```
guidance
e
t
isSearchable
n
isMulti
isDisabled
i
tabSelectsValue
context
"menu"
"Use Up and Down to choose options"
concat
r
""
", press Enter to select the currently focuse doption"
", press Escape to exit the menu"
", press Tab to select the option and exit the menu"
"."
```

Of course, this gives you far less information than the original, but I think it could be a good trade-off in cases where you want to look at the diff a little bit but don't have time to see everything.

## Example input 2
Since common generic ones like e, t, n, "", and "." would show up frequently in any context, they would have already been seen in the past, and therefore filtered out. You'd more focus on the e.g. "Use Up and Down to choose options", with some kind of convenient way to jump back to see it in-context in the code.

For input like the following:
```diff
+      var a = n(72843);
+      function s(e, t) {
+        for (var n = 0; n < t.length; n++) {
+          var r = t[n];
+          (r.enumerable = r.enumerable || !1),
+            (r.configurable = !0),
+            "value" in r && (r.writable = !0),
+            Object.defineProperty(e, (0, a.Z)(r.key), r);
+        }
+      }
+      function l(e, t, n) {
+        return (
+          t && s(e.prototype, t),
+          n && s(e, n),
+          Object.defineProperty(e, "prototype", { writable: !1 }),
+          e
+        );
+      }
```
, none of the names or strings would probably be new, and so you wouldn't see it at all. This is intended, because I can't gleam any conclusions from looking at it, and thus would prefer not to see it

## Glenn's comments
https://twitter.com/_devalias/status/1770284997385277554

> I think given the size of a lot of the JS files, and the diffs themselves; it would probably end up being a LOT of strings; which might be confusing when removed from the rest of the context of the surrounding code.

For large diffs I think it'd be a lot, but strings and names are a subset of the raw diff, so it should still be less work than a full manual analysis. The idea is to just visually filter through them until you see a name/string that looks interesting on their own, which could lead to something good in-context.

> It should be fairly easy to prototype a script using babel parser and babel traverse though.
> You would add a rule or couple to the traverse so that it matches on whatever strings are called in the AST; and then output them to console or a file or similar.

Haven't worked with Babel but some relevant docs seem to be
- https://babeljs.io/docs/babel-parser
- https://www.npmjs.com/package/%40babel/traverse

Are there other AST parsers too? Would something like TreeSitter work? I'd generally prefer to avoid node.js if it's not required

> Then you would just diff that output file of strings between one build and the next.
> If code moves around between builds it might introduce it’s own form of noise (but maybe git diff —color-moved would handle that still anyway)

I haven't seen enough diffs to exactly anticipate how these would look like but there might be different solutions like color-moved that could work depending on how it goes

> I also noticed you liked some of my tweets about my more generalised diff minimiser; which would reduce the noise of things a fair bit overall as well.
> I still need to polish that and commit/upload it; been super busy lately and haven’t had a chance to yet.

Related:

- https://github.com/0xdevalias/chatgpt-source-watch/issues/3

> Feel free to open an issue on the ChatGPT Source watch repo about the string extractor idea + link back to these tweets/copy the relevant info in.
> I’d be happy to give some more pointers about it and/or include it in the repo if you wanted to work on it.

Yeah, I want to make a prototype and see if it will kind of work. I'm still not sure on the implementation, though; the most efficient system might be to integrate with a text editor, which makes it harder to be replicable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Simple diff analysis based on strings and identifiers #10

Example input 1

Example input 2

Glenn's comments

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Simple diff analysis based on strings and identifiers #10

Description

Example input 1

Example input 2

Glenn's comments

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions