Skip to content

Conversation

@goldsteinn
Copy link
Contributor

@goldsteinn goldsteinn commented Oct 17, 2024

New proposed function clang-format-vc-diff.

It is the same as calling clang-format-region on all diffs between
the content of a buffer-file and the content of the file at git
revision HEAD. This is essentially the same thing as:
git-clang-format -f {filename}
If the current buffer is saved.

The motivation is many project (LLVM included) both have code that is
non-compliant with there clang-format style and disallow unrelated
format diffs in PRs. This means users can't just run
clang-format-buffer on the buffer they are working on, and need to
manually go through all the regions by hand to get them
formatted. This is both an error prone and annoying workflow.

@llvmbot
Copy link
Member

llvmbot commented Oct 17, 2024

@llvm/pr-subscribers-clang-format

Author: None (goldsteinn)

Changes

New proposed function clang-format-git-diffs.

It is the same as calling clang-format-region on all diffs between
the content of a buffer-file and the content of the file at git
revision HEAD. This is essentially the same thing as:
git-clang-format -f {filename}
If the current buffer is saved.

The motivation is many project (LLVM included) both have code that is
non-compliant with there clang-format style and disallow unrelated
format diffs in PRs. This means users can't just run
clang-format-buffer on the buffer they are working on, and need to
manually go through all the regions by hand to get them
formatted. This is both an error prone and annoying workflow.


Full diff: https://github.com/llvm/llvm-project/pull/112792.diff

1 Files Affected:

  • (modified) clang/tools/clang-format/clang-format.el (+143-14)
diff --git a/clang/tools/clang-format/clang-format.el b/clang/tools/clang-format/clang-format.el
index f3da5415f8672b..dfdef2260de06e 100644
--- a/clang/tools/clang-format/clang-format.el
+++ b/clang/tools/clang-format/clang-format.el
@@ -132,18 +132,97 @@ is a zero-based file offset, assuming ‘utf-8-unix’ coding."
     (lambda (byte &optional _quality _coding-system)
       (byte-to-position (1+ byte)))))
 
-;;;###autoload
-(defun clang-format-region (start end &optional style assume-file-name)
-  "Use clang-format to format the code between START and END according to STYLE.
-If called interactively uses the region or the current statement if there is no
-no active region. If no STYLE is given uses `clang-format-style'. Use
-ASSUME-FILE-NAME to locate a style config file, if no ASSUME-FILE-NAME is given
-uses the function `buffer-file-name'."
-  (interactive
-   (if (use-region-p)
-       (list (region-beginning) (region-end))
-     (list (point) (point))))
+(defun clang-format--git-diffs-get-diff-lines (file-orig file-new)
+  "Return all line regions that contain diffs between FILE-ORIG and
+FILE-NEW.  If there is no diff 'nil' is returned. Otherwise the
+return is a 'list' of lines in the format '--lines=<start>:<end>'
+which can be passed directly to 'clang-format'"
+  ;; Temporary buffer for output of diff.
+  (with-temp-buffer
+    (let ((status (call-process
+                   "diff"
+                   nil
+                   (current-buffer)
+                   nil
+                   ;; Binary diff has different behaviors that we
+                   ;; aren't interested in.
+                   "-a"
+                   ;; Printout changes as only the line groups.
+                   "--changed-group-format=--lines=%dF:%dL "
+                   ;; Ignore unchanged content.
+                   "--unchanged-group-format="
+                   file-orig
+                   file-new
+                   )
+                  )
+          (stderr (concat (if (zerop (buffer-size)) "" ": ")
+                          (buffer-substring-no-properties
+                           (point-min) (line-end-position)))))
+      (when (stringp status)
+        (error "(diff killed by signal %s%s)" status stderr))
+      (unless (= status 0)
+        (unless (= status 1)
+          (error "(diff returned unsuccessfully %s%s)" status stderr)))
+
+
+      (if (= status 0)
+          ;; Status == 0 -> no Diff.
+          nil
+        (progn
+          ;; Split "--lines=<S0>:<E0>... --lines=<SN>:<SN>" output to
+          ;; a list for return.
+          (s-split
+           " "
+           (string-trim
+            (buffer-substring-no-properties
+             (point-min) (point-max)))))))))
+
+(defun clang-format--git-diffs-get-git-head-file ()
+  "Returns a temporary file with the content of 'buffer-file-name' at
+git revision HEAD. If the current buffer is either not a file or not
+in a git repo, this results in an error"
+  ;; Needs current buffer to be a file
+  (unless (buffer-file-name)
+    (error "Buffer is not visiting a file"))
+  ;; Need to be able to find version control (git) root
+  (unless (vc-root-dir)
+    (error "File not known to git"))
+  ;; Need version control to in fact be git
+  (unless (string-equal (vc-backend (buffer-file-name)) "Git")
+    (error "Not using git"))
+
+  (let ((tmpfile-git-head (make-temp-file "clang-format-tmp-git-head-content")))
+    ;; Get filename relative to git root
+    (let ((git-file-name (substring
+                          (expand-file-name (buffer-file-name))
+                          (string-width (expand-file-name (vc-root-dir)))
+                          nil)))
+      (let ((status (call-process
+                     "git"
+                     nil
+                     `(:file, tmpfile-git-head)
+                     nil
+                     "show" (concat "HEAD:" git-file-name)))
+            (stderr (with-temp-buffer
+                      (unless (zerop (cadr (insert-file-contents tmpfile-git-head)))
+                        (insert ": "))
+                      (buffer-substring-no-properties
+                       (point-min) (line-end-position)))))
+        (when (stringp status)
+          (error "(git show HEAD:%s killed by signal %s%s)"
+                 git-file-name status stderr))
+        (unless (zerop status)
+          (error "(git show HEAD:%s returned unsuccessfully %s%s)"
+                 git-file-name status stderr))))
+    ;; Return temporary file so we can diff it.
+    tmpfile-git-head))
 
+(defun clang-format--region-impl (start end &optional style assume-file-name lines)
+  "Common implementation for 'clang-format-buffer',
+'clang-format-region', and 'clang-format-git-diffs'. START and END
+refer to the region to be formatter. STYLE and ASSUME-FILE-NAME are
+used for configuring the clang-format. And LINES is used to pass
+specific locations for reformatting (i.e diff locations)."
   (unless style
     (setq style clang-format-style))
 
@@ -176,8 +255,12 @@ uses the function `buffer-file-name'."
                                       (list "--assume-filename" assume-file-name))
                                ,@(and style (list "--style" style))
                                "--fallback-style" ,clang-format-fallback-style
-                               "--offset" ,(number-to-string file-start)
-                               "--length" ,(number-to-string (- file-end file-start))
+                               ,@(and lines lines)
+                               ,@(and (not lines)
+                                      (list
+                                       "--offset" (number-to-string file-start)
+                                       "--length" (number-to-string
+                                                   (- file-end file-start))))
                                "--cursor" ,(number-to-string cursor))))
               (stderr (with-temp-buffer
                         (unless (zerop (cadr (insert-file-contents temp-file)))
@@ -205,6 +288,48 @@ uses the function `buffer-file-name'."
       (delete-file temp-file)
       (when (buffer-name temp-buffer) (kill-buffer temp-buffer)))))
 
+;;;###autoload
+(defun clang-format-git-diffs (&optional style assume-file-name)
+  "The same as 'clang-format-buffer' but only operates on the git
+diffs from HEAD in the buffer. If no STYLE is given uses
+`clang-format-style'. Use ASSUME-FILE-NAME to locate a style config
+file. If no ASSUME-FILE-NAME is given uses the function
+`buffer-file-name'."
+  (interactive)
+  (let ((tmpfile-git-head
+         (clang-format--git-diffs-get-git-head-file))
+        (tmpfile-curbuf (make-temp-file "clang-format-git-tmp")))
+    ;; Move current buffer to a temporary file to take a diff. Even if
+    ;; current-buffer is backed by a file, we want to diff the buffer
+    ;; contents which might not be saved.
+    (write-region nil nil tmpfile-curbuf nil 'nomessage)
+    ;; Git list of lines with a diff.
+    (let ((diff-lines
+           (clang-format--git-diffs-get-diff-lines
+            tmpfile-git-head tmpfile-curbuf)))
+      ;; If we have any diffs, format them.
+      (when diff-lines
+        (clang-format--region-impl
+         (point-min)
+         (point-max)
+         style
+         assume-file-name
+         diff-lines)))))
+
+;;;###autoload
+(defun clang-format-region (start end &optional style assume-file-name)
+  "Use clang-format to format the code between START and END according
+to STYLE.  If called interactively uses the region or the current
+statement if there is no no active region. If no STYLE is given uses
+`clang-format-style'. Use ASSUME-FILE-NAME to locate a style config
+file, if no ASSUME-FILE-NAME is given uses the function
+`buffer-file-name'."
+  (interactive
+   (if (use-region-p)
+       (list (region-beginning) (region-end))
+     (list (point) (point))))
+  (clang-format--region-impl start end style assume-file-name))
+
 ;;;###autoload
 (defun clang-format-buffer (&optional style assume-file-name)
   "Use clang-format to format the current buffer according to STYLE.
@@ -212,7 +337,11 @@ If no STYLE is given uses `clang-format-style'. Use ASSUME-FILE-NAME
 to locate a style config file. If no ASSUME-FILE-NAME is given uses
 the function `buffer-file-name'."
   (interactive)
-  (clang-format-region (point-min) (point-max) style assume-file-name))
+  (clang-format--region-impl
+   (point-min)
+   (point-max)
+   style
+   assume-file-name))
 
 ;;;###autoload
 (defalias 'clang-format 'clang-format-region)

@goldsteinn
Copy link
Contributor Author

ping

@goldsteinn
Copy link
Contributor Author

ping2

@owenca
Copy link
Contributor

owenca commented Nov 2, 2024

@ideasman42 can you review this PR? Thanks!

@goldsteinn
Copy link
Contributor Author

@luke957 as well also can probably review the elisp code for quality.

@owenca owenca requested review from luke957 and removed request for owenca November 3, 2024 00:43
Copy link
Contributor

@ideasman42 ideasman42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I'm not a regular clang contributor, I just submitted a small improvement to clang-format.el, and maintain some emacs packages in elpa & melpa.


Overall the PR looks like it needs more attention to detail, as far as I can tell it's creating temporary files and never removing them, various minor issues noted inline.

  • This PR needs to be rebased on top of the recently added clang-format-on-save-mode commit.
  • Running package-lint reports.
156:19: warning: Closing parens should not be wrapped onto new lines.
157:18: warning: Closing parens should not be wrapped onto new lines.
176:12: error: You should depend on (emacs "24.4") or the compat package if you need `string-trim'.
188:11: error: You should depend on (emacs "25.1") if you need `vc-root-dir'.
198:59: error: You should depend on (emacs "25.1") if you need `vc-root-dir'.

(stderr (concat (if (zerop (buffer-size)) "" ": ")
(buffer-substring-no-properties
(point-min) (line-end-position)))))
(when (stringp status)
Copy link
Contributor

@ideasman42 ideasman42 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*picky* I think a (cond ...) would read better than unless & if statements.

example:

(cond 
  ((stringp status)
    ...snip...)
  ((= status 0)
    nil)
  ((= status 1)
    ...snip...)
  (t
    (error "...snip...")))

(progn
;; Split "--lines=<S0>:<E0>... --lines=<SN>:<SN>" output to
;; a list for return.
(s-split
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s-split is not part of emacs standard library, depending on https://github.com/magnars/s.el doesn't seem worthwhile, use built-in emacs functionality.

(unless (buffer-file-name)
(error "Buffer is not visiting a file"))
;; Need to be able to find version control (git) root
(unless (vc-root-dir)
Copy link
Contributor

@ideasman42 ideasman42 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*picky* let-bind to a variable and reuse it instead of calling twice.

;; Ignore unchanged content.
"--unchanged-group-format="
file-orig
file-new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*picky* don't use dangling parenthesis.

(unless (string-equal (vc-backend (buffer-file-name)) "Git")
(error "Not using git"))

(let ((tmpfile-git-head (make-temp-file "clang-format-tmp-git-head-content")))
Copy link
Contributor

@ideasman42 ideasman42 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell this temporary file is never deleted, further, any errors after creating the temporary file would leave it created.

Without attempting to do this myself, I'm not sure of the best solution but suggest: (with-temp-file ...)

The caller would need to use this and pass in the temporary file. If the behavior of with-temp-file isn't what your after, you could write your own macro that creates a scoped temporary file that gets removed in case of errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with-temp-file doesn't quite work here, V2 I just wrap the thing in an unwind-protect and delete the temp files in it.

(interactive)
(let ((tmpfile-git-head
(clang-format--git-diffs-get-git-head-file))
(tmpfile-curbuf (make-temp-file "clang-format-git-tmp")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this temporary file isn't removed either.

(when (buffer-name temp-buffer) (kill-buffer temp-buffer)))))

;;;###autoload
(defun clang-format-git-diffs (&optional style assume-file-name)
Copy link
Contributor

@ideasman42 ideasman42 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more general point, would it make sense to replace the term git with something more generic like vc.

This way we wouldn't need clang-format-hg-diffs or separate commands to support other version control systems in the future.

It can be documented that git is currently the only supported version control, others could be added.

I also find the term diffs a bit confusing, would changed make more sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the term diff would be more consistent with existing emacs functions e.g. vc-diff, diff-delete-trailing-whitespace etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I chose diffs was to indicate it wasn't just the diff at point or something, but Ill rename clang-format-vc-diff

New proposed function `clang-format-git-diffs`.

It is the same as calling `clang-format-region` on all diffs between
the content of a buffer-file and the content of the file at git
revision HEAD. This is essentially the same thing as:
    `git-clang-format -f {filename}`
If the current buffer is saved.

The motivation is many project (LLVM included) both have code that is
non-compliant with there clang-format style and disallow unrelated
format diffs in PRs. This means users can't just run
`clang-format-buffer` on the buffer they are working on, and need to
manually go through all the regions by hand to get them
formatted. This is both an error prone and annoying workflow.
@goldsteinn
Copy link
Contributor Author

Note that I'm not a regular clang contributor, I just submitted a small improvement to clang-format.el, and maintain some emacs packages in elpa & melpa.

Overall the PR looks like it needs more attention to detail, as far as I can tell it's creating temporary files and never removing them, various minor issues noted inline.

* This PR needs to be rebased on top of the recently added `clang-format-on-save-mode` commit.

* Running `package-lint` reports.
156:19: warning: Closing parens should not be wrapped onto new lines.
157:18: warning: Closing parens should not be wrapped onto new lines.
176:12: error: You should depend on (emacs "24.4") or the compat package if you need `string-trim'.
188:11: error: You should depend on (emacs "25.1") if you need `vc-root-dir'.
198:59: error: You should depend on (emacs "25.1") if you need `vc-root-dir'.

Thank you for the detailed feedback, I will have v2 up tomorrow.

Regarding the package lints, is requiring 25.1 acceptable? I'm not really sure how to implement this without vc-root-dir.

@goldsteinn goldsteinn force-pushed the goldsteinn/emacs-clang-format-git branch from 3e7bac3 to 2c8395d Compare November 3, 2024 16:31
@goldsteinn
Copy link
Contributor Author

NB: The new API is 'clang-format-vc-diff'. I will update commit title when merging.

@goldsteinn
Copy link
Contributor Author

Current return from package-lint:

3 issues found:

1:0: error: Package should have a Homepage or URL header.
1:60: warning: You should depend on (emacs "24.1") if you need lexical-binding.
223:19: error: You should depend on (emacs "25.1") if you need `vc-root-dir'.

Copy link
Contributor

@ideasman42 ideasman42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more ore less OK, although there are some areas that look like they can be improved/simplified.

@ideasman42
Copy link
Contributor

ideasman42 commented Nov 20, 2024

Suggesting a fairly large change to this PR, although it's quite opinionated.

If I were maintaining clang-format.el I'd split out the logic for generating a list of changes into it's own package.

Exactly how this is done is not be so important... it could for example support a custom function that generates a list of line number pairs.

clang-format-vc-diff could depend on a custom function which defaults to nil, and would warn when the function wasn't set. e.g. (defcustom clang-format-vc-diff-function ...)

The function can simply return an ordered list of integer pairs representing lines.

Then there can be a package on MELPA clang-format-diff or similar and all the issues with platform compatibility can be handled there without pull requests to LLVM.


There are a few reasons this has benefits.

  • The current PR misses (require ...) for, vc vc-git ... however requiring this is unnecessary for users who wont use the functionality.
  • Any bugs relating to details of different versions of git/diff/WIN32 compatibility can be handled without going via LLVM PR.
  • Support for other version control or other ways of generating change ranges can be handled even user-customized - integrated with other packages that already track changes - such as diff-hl could be used.
  • All the details of integrating diff-hl/other-version-control, platform support (proper error messages if diff or git isn't found or encounters some unknown encoding)... can be handled separately.
  • LLVM's clang-format.el remains minimal, users who fully format their source don't need to install the extra functionality.
  • Less maintenance burden for LLVM.

The main down side is users who rely on this behavior need to install an additional package although I don't see this as a big down-side.

@goldsteinn
Copy link
Contributor Author

Suggesting a fairly large change to this PR, although it's quite opinionated.

If I were maintaining clang-format.el I'd split out the logic for generating a list of changes into it's own package.

Exactly how this is done is not be so important... it could for example support a custom function that generates a list of line number pairs.

clang-format-vc-diff could depend on a custom function which defaults to nil, and would warn when the function wasn't set. e.g. (defcustom clang-format-vc-diff-function ...)

The function can simply return an ordered list of integer pairs representing lines.

Then there can be a package on MELPA clang-format-diff or similar and all the issues with platform compatibility can be handled there without pull requests to LLVM.

So IIUC, you are proposing a create an entirely seperate project for "vc-diff-lines" or something like that then making that a dependency for for clang-format? Is that correct?

There are a few reasons this has benefits.

* The current PR misses `(require ...)` for, `vc` `vc-git` ... however requiring this is unnecessary for users who wont use the functionality.

* Any bugs relating to details of different versions of git/diff/WIN32 compatibility can be handled without going via LLVM PR.

* Support for other version control or other ways of generating change ranges can be handled even user-customized - integrated with other packages that already track changes - such as `diff-hl` could be used.

* All the details of integrating diff-hl/other-version-control, platform support (proper error messages if `diff` or `git` isn't found or encounters some unknown encoding)... can be handled separately.

* LLVM's `clang-format.el` remains minimal, users who fully format their source don't need to install the extra functionality.

* Less maintenance burden for LLVM.

At least IMO, this is the other way around. Relying on external projects I think typically creates a higher maintainer burden, particularly in this case where I think LLVM would be the only user of the new package, so whenever changes where needed for the diff/vc stuff, it would require coordination with an external project.

Further, if the external project got its own users and clang-format.el's needs changed, it could become a lot more difficult to make the changes, as now there are independent users who might expect stability/etc...

The main down side is users who rely on this behavior need to install an additional package although I don't see this as a big down-side.

Overall, I'm pretty down on this. IMO, the vc/diff functionality is pretty specific to the use-case we have in clang-format.el and neither is complex enough to warrant or made more convenient by having in an independent package.

I think a similiar argument would imply the git-clang-format script we already maintain under LLVM should be extract out (or at least all the commands that call into "git") which seems equally unappealing.

@ideasman42
Copy link
Contributor

ideasman42 commented Nov 20, 2024

So IIUC, you are proposing a create an entirely seperate project for "vc-diff-lines" or something like that then making that a dependency for for clang-format? Is that correct?

I would not depend on it - instead, it would be loosely coupled - if clang-format-diff-function was set, it would be used. Otherwise clang-format-vc-diff wouldn't work (reporting the function to calculate changed lines needs to be set).


At least IMO, this is the other way around. Relying on external projects I think typically creates a higher maintainer burden, particularly in this case where I think LLVM would be the only user of the new package, so whenever changes where needed for the diff/vc stuff, it would require coordination with an external project.

This can happen for intricate API's - in this case - a single function that returns sorted line-ranges: (list (int . int) ...) is a fairly minimal interface, so I don't see separating the code out as adding significant overhead/burden.


Overall, I'm pretty down on this. IMO, the vc/diff functionality is pretty specific to the use-case we have in clang-format.el and neither is complex enough to warrant or made more convenient by having in an independent package.

Fair enough - as noted, this is fairly opinionated - and I don't think your "wrong" for holding the contrary position.

Keep in mind many people have been using clang-format.el for years without this functionality, and will continue to do so.
I disagree that this change is simple though, it seems simple on face value - based on the number of replies in the review so far, issues calling out to git & diff with their different versions ... possible optimizations, possible errors when they fail ... is in fact more complicated than may first seem.

There are some potential bugs that could bite us:

  • How to handle git failing (if git doesn't know about the file... the repository is in some unexpected state... the file could be in the middle of resolving a conflict for e.g.).
  • What if git or diff considers the file to be a binary file.
  • The emacs buffers should use the encoding settings from the buffer that is being edited... do they? (I'd need to double check - at a guess - they don't seem to at the moment).

This is not an attempt to deminish the contribution, just to note that there are all sorts of cases where things can go wrong - where a feature like this turns out not to be all that simple and there are corner cases that need to be investigated/resolved.

As noted, this is my suggestion to support a customizable line-range generator so clang-format.el can be kept minimal.
I'll leave the decision to others.

@goldsteinn
Copy link
Contributor Author

Overall, I'm pretty down on this. IMO, the vc/diff functionality is pretty specific to the use-case we have in clang-format.el and neither is complex enough to warrant or made more convenient by having in an independent package.

Fair enough - as noted, this is fairly opinionated - and I don't think your "wrong" for holding the contrary position.

Keep in mind many people have been using clang-format.el for years without this functionality, and will continue to do so. I disagree that this change is simple though, it seems simple on face value - based on the number of replies in the review so far, issues calling out to git & diff with their different versions ... possible optimizations, possible errors when they fail ... is in fact more complicated than may first seem.

There are some potential bugs that could bite us:

  • How to handle git failing (if git doesn't know about the file... the repository is in some unexpected state... the file could be in the middle of resolving a conflict for e.g.).
  • What if git or diff considers the file to be a binary file.
  • The emacs buffers should use the encoding settings from the buffer that is being edited... do they? (I'd need to double check - at a guess - they don't seem to at the moment).

I don't disagree these are all potential pitfalls (and there are certainly more), I just don't see how having the diff code in a separate project ameliorates any of them. And as stated earlier, I think it in fact complicates them.

@ideasman42
Copy link
Contributor

ideasman42 commented Nov 21, 2024

I don't disagree these are all potential pitfalls (and there are certainly more), I just don't see how having the diff code in a separate project ameliorates any of them. And as stated earlier, I think it in fact complicates them.

The issue of "complexity" is quite subjective, a customizable function that returns line ranges doesn't strike me as a complex API that's likely to break/change with the benefit of easily integrating other diffing methods.

From a user perspective it likely just means one extra package, possibly setting a configuration value.

Or, to avoid this PR having to handle system & version-control spesific details - we could consider calling vc-diff directly to generate the diff - this abstracts away all the details of calling git & diff directly, personally I would still rather keep the functionality separate - then this can be easily integrated with hl-diff or other packages that already track changed lines, removing the overhead of the current method used.

@goldsteinn
Copy link
Contributor Author

I don't disagree these are all potential pitfalls (and there are certainly more), I just don't see how having the diff code in a separate project ameliorates any of them. And as stated earlier, I think it in fact complicates them.

The issue of "complexity" is quite subjective, a customizable function that returns line ranges doesn't strike me as a complex API that's likely to break/change with the benefit of easily integrating other diffing methods.

From a user perspective it likely just means one extra package, possibly setting a configuration value.

Or, to avoid this PR having to handle system & version-control spesific details - we could consider calling vc-diff directly to generate the diff - this abstracts away all the details of calling git & diff directly, personally I would still rather keep the functionality separate - then this can be easily integrated with hl-diff or other packages that already track changed lines, removing the overhead of the current method used.

vc-diff AFAICT won't work because it operates on files not buffer contents. It would be a nice simplifcation, although doesn't really change the fact we depend on vc.

All things considered, I think requiring vc is a lot less intrusive than essentially our own vc wrapper....

@ideasman42
Copy link
Contributor

ideasman42 commented Nov 21, 2024

All things considered, I think requiring vc is a lot less intrusive than essentially our own vc wrapper....

Not sure what you mean exactly by a wrapper, if git/diff logic can be abstracted away - that seems a net gain.
If ugly/fragile logic is required to do that - it's debatable and may not be worth it (all things considered).

Anyway, I was only mentioning that we could consider abstracting away the logic - if it's not practical, there is no need to go into details discussing it, although I did mail the emacs-devel mailing list to check if this might be supported:

@goldsteinn
Copy link
Contributor Author

All things considered, I think requiring vc is a lot less intrusive than essentially our own vc wrapper....

Not sure what you mean exactly by a wrapper, if git/diff logic can be abstracted away - that seems a net gain. If ugly/fragile logic is required to do that - it's debatable and may not be worth it (all things considered).

Well the wrapped is to hide the ugly git/diff stuff.

Anyway, I was only mentioning that we could consider abstracting away the logic - if it's not practical, there is no need to go into details discussing it, although I did mail the emacs-devel mailing list to check if this might be supported:

My preference would be the get this in as is.

@goldsteinn
Copy link
Contributor Author

ping

1 similar comment
@goldsteinn
Copy link
Contributor Author

ping

@goldsteinn
Copy link
Contributor Author

ping @ideasman42

@goldsteinn
Copy link
Contributor Author

ping @ideasman42 or @lukel97

@ideasman42
Copy link
Contributor

ideasman42 commented Jan 5, 2025

Checking again and am still considering this patch too spesific/incomplete, checking vc's diff calls to git - they are considerably more involved than in this PR, meaning this PR will likely require follow up commits to fix problems (see vc-diff-internal, inlined below for reference, it deals with EOL conversion, added files, coding systems... things this PR doesn't really handle).

Attached a patch that allows for formatting line-ranges, the line range generation must be implemented externally.

  • The clang-format-modified-fn customizable function is used to return a list of "modified" line ranges, this can be set by 3rd party packages - VC implementation independent.
  • This function simply returns a list of integer pairs (line ranges).
  • An error is raised if the function isn't set.

Patch files:

Perhaps support for version control diff's can be supported by clang-format-plus.


As mentioned earlier, calling diff can be quite involved if all corner cases are properly handled.

(defun vc-diff-internal (async vc-fileset rev1 rev2 &optional verbose buffer)
  "Report diffs between revisions REV1 and REV2 of a fileset in VC-FILESET.
ASYNC non-nil means run the backend's commands asynchronously if possible.
VC-FILESET should have the format described in `vc-deduce-fileset'.
Output goes to the buffer BUFFER, which defaults to *vc-diff*.
BUFFER, if non-nil, should be a buffer or a buffer name.
Return t if the buffer had changes, nil otherwise."
  (unless buffer
    (setq buffer "*vc-diff*"))
  (let* ((files (cadr vc-fileset))
	 (messages (cons (format "Finding changes in %s..."
                                 (vc-delistify files))
                         (format "No changes between %s and %s"
                                 (or rev1 "working revision")
                                 (or rev2 "workfile"))))
	 ;; Set coding system based on the first file.  It's a kluge,
	 ;; but the only way to set it for each file included would
	 ;; be to call the back end separately for each file.
	 (coding-system-for-read
          ;; Force the EOL conversion to be -unix, in case the files
          ;; to be compared have DOS EOLs.  In that case, EOL
          ;; conversion will produce a patch file that will either
          ;; fail to apply, or will change the EOL format of some of
          ;; the lines in the patched file.
          (coding-system-change-eol-conversion
	   (if files (vc-coding-system-for-diff (car files)) 'undecided)
           'unix))
         (orig-diff-buffer-clone
          (if revert-buffer-in-progress-p
              (clone-buffer
               (generate-new-buffer-name " *vc-diff-clone*") nil))))
    ;; On MS-Windows and MS-DOS, Diff is likely to produce DOS-style
    ;; EOLs, which will look ugly if (car files) happens to have Unix
    ;; EOLs.  But for Git, we must force Unix EOLs in the diffs, since
    ;; Git always produces Unix EOLs in the parts that didn't come
    ;; from the file, and wants to see any CR characters when applying
    ;; patches.
    (if (and (memq system-type '(windows-nt ms-dos))
             (not (eq (car vc-fileset) 'Git)))
	(setq coding-system-for-read
	      (coding-system-change-eol-conversion coding-system-for-read
						   'dos)))
    (vc-setup-buffer buffer)
    (message "%s" (car messages))
    ;; Many backends don't handle well the case of a file that has been
    ;; added but not yet committed to the repo (notably CVS and Subversion).
    ;; Do that work here so the backends don't have to futz with it.  --ESR
    ;;
    ;; Actually most backends (including CVS) have options to control the
    ;; behavior since which one is better depends on the user and on the
    ;; situation).  Worse yet: this code does not handle the case where
    ;; `file' is a directory which contains added files.
    ;; I made it conditional on vc-diff-added-files but it should probably
    ;; just be removed (or copied/moved to specific backends).  --Stef.
    (when vc-diff-added-files
      (let ((filtered '())
	    process-file-side-effects)
        (dolist (file files)
          (if (or (file-directory-p file)
                  (not (string= (vc-working-revision file) "0")))
              (push file filtered)
            ;; This file is added but not yet committed;
            ;; there is no repository version to diff against.
            (if (or rev1 rev2)
                (error "No revisions of %s exist" file)
              ;; We regard this as "changed".
              ;; Diff it against /dev/null.
              (apply #'vc-do-command buffer
                     (if async 'async 1) "diff" file
                     (append (vc-switches nil 'diff) `(,(null-device)))))))
        (setq files (nreverse filtered))))
    (vc-call-backend (car vc-fileset) 'diff files rev1 rev2 buffer async)
    (set-buffer buffer)
    ;; Make the *vc-diff* buffer read only, the diff-mode key
    ;; bindings are nicer for read only buffers. pcl-cvs does the
    ;; same thing.
    (setq buffer-read-only t)
    (diff-mode)
    (setq-local diff-vc-backend (car vc-fileset))
    (setq-local diff-vc-revisions (list rev1 rev2))
    (setq-local revert-buffer-function
                (lambda (_ignore-auto _noconfirm)
                  (vc-diff-internal async vc-fileset rev1 rev2 verbose)))
    (if (and (zerop (buffer-size))
             (not (get-buffer-process (current-buffer))))
        ;; Treat this case specially so as not to pop the buffer.
        (progn
          (message "%s" (cdr messages))
          nil)
      ;; Display the buffer, but at the end because it can change point.
      (pop-to-buffer (current-buffer))
      ;; The diff process may finish early, so call `vc-diff-finish'
      ;; after `pop-to-buffer'; the former assumes the diff buffer is
      ;; shown in some window.
      (let ((buf (current-buffer)))
        (vc-run-delayed (vc-diff-finish buf (when verbose messages)
                                        orig-diff-buffer-clone)))
      ;; In the async case, we return t even if there are no differences
      ;; because we don't know that yet.
      t)))

@goldsteinn
Copy link
Contributor Author

Checking again and am still considering this patch too spesific/incomplete, checking vc's diff calls to git - they are considerably more involved than in this PR, meaning this PR will likely require follow up commits to fix problems (see vc-diff-internal, inlined below for reference, it deals with EOL conversion, added files, coding systems... things this PR doesn't really handle).

I'm not sure why the complexity of vc-diff-internal is really related to the complexity. We only use vc-root-dir and vc-backend. Further, the vc system is not some unstable new feature, its core since 25.1. I don't see why we would expect this to fall over all of a sudden.

Can you expand on what is incomplete or too specific about it. We have git-clang-format, this is essentially just porting that functionality to emacs. Is the issue that we only support "git"? Or something else?

Attached a patch that allows for formatting line-ranges, the line range generation must be implemented externally.

* The `clang-format-modified-fn` customizable function is used to return a list of "modified" line ranges, this can be set by 3rd party packages - VC implementation independent.

* This function simply returns a  list of integer pairs (line ranges).

* An error is raised if the function isn't set.

That seems far more specific and incomplete... but ultimately if its the only part that can be accepted its better than nothing.

Patch files:

* Patch on the main branch
  [pr-112792-update.diff.txt](https://github.com/user-attachments/files/18309256/pr-112792-update.diff.txt)

* The whole file (for convenience).
  [clang-format.el.update.txt](https://github.com/user-attachments/files/18309260/clang-format.el.update.txt)

* The git/diff logic extracted into a separate file - which would not be applied to the LLVM project, just use for testing.
  [clang-format-git-vc-diff.el.txt](https://github.com/user-attachments/files/18309263/clang-format-git-vc-diff.el.txt)

Perhaps support for version control diff's can be supported by clang-format-plus.

Ultimately I really want to get this into actual clang-format where it will be maintained (by myself included) and kept up to date.

I also think this functionality will be useful for other developers (lukel97 at least seemed to express he would find it useful).
Having it in some tertiary repo seems liable to have it end up 1) not much used and 2) in a state of decay in not to long.

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay on this again. However I tried it out locally and it now seems to work on macOS, thanks for fixing that!

I really can't speak much for the elisp, but we don't have many reviewers for the emacs stuff and this feature would be really handy to have.

So LGTM, if it needs more review I think it can be done post-commit :)

@goldsteinn
Copy link
Contributor Author

Checking again and am still considering this patch too spesific/incomplete, checking vc's diff calls to git - they are considerably more involved than in this PR, meaning this PR will likely require follow up commits to fix problems (see vc-diff-internal, inlined below for reference, it deals with EOL conversion, added files, coding systems... things this PR doesn't really handle).

I'm not sure why the complexity of vc-diff-internal is really related to the complexity. We only use vc-root-dir and vc-backend. Further, the vc system is not some unstable new feature, its core since 25.1. I don't see why we would expect this to fall over all of a sudden.

Can you expand on what is incomplete or too specific about it. We have git-clang-format, this is essentially just porting that functionality to emacs. Is the issue that we only support "git"? Or something else?

Attached a patch that allows for formatting line-ranges, the line range generation must be implemented externally.

* The `clang-format-modified-fn` customizable function is used to return a list of "modified" line ranges, this can be set by 3rd party packages - VC implementation independent.

* This function simply returns a  list of integer pairs (line ranges).

* An error is raised if the function isn't set.

That seems far more specific and incomplete... but ultimately if its the only part that can be accepted its better than nothing.

Patch files:

* Patch on the main branch
  [pr-112792-update.diff.txt](https://github.com/user-attachments/files/18309256/pr-112792-update.diff.txt)

* The whole file (for convenience).
  [clang-format.el.update.txt](https://github.com/user-attachments/files/18309260/clang-format.el.update.txt)

* The git/diff logic extracted into a separate file - which would not be applied to the LLVM project, just use for testing.
  [clang-format-git-vc-diff.el.txt](https://github.com/user-attachments/files/18309263/clang-format-git-vc-diff.el.txt)

Perhaps support for version control diff's can be supported by clang-format-plus.

Ultimately I really want to get this into actual clang-format where it will be maintained (by myself included) and kept up to date.

I also think this functionality will be useful for other developers (lukel97 at least seemed to express he would find it useful). Having it in some tertiary repo seems liable to have it end up 1) not much used and 2) in a state of decay in not to long.

@ideasman42

(message "(clang-format: incomplete (syntax errors)%s)" stderr)
(message "(clang-format: success%s)" stderr))))
(delete-file temp-file)
(ignore-errors (delete-file temp-file))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest with-demoted-errors here too, could be a function as it's used elsewhere.

(delete-file (pop ,bind-files-to-delete)))))))


(defun clang-format--vc-diff-get-diff-lines (file-orig file-new)
Copy link
Contributor

@ideasman42 ideasman42 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*picky* - personal preference, but I find it odd that the function returns command line arguments for clang-format.

It can return a list of cons-pairs of instead (list (cons int int) ...) instead, then the clang-format function can convert them to the arguments used by clang-format.

(the patch I attached did this), it's not a big change, I'd expect a formatting function that can operate on multiple ranges to take int-pairs instead of a list of strings.

Copy link
Contributor

@ideasman42 ideasman42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well on Linux, generally seems fine although I still think it's worth considering splitting out "getting-changes" into a separate shared code-base to avoid dealing with details of calling diff for different version control system here.

(nreverse diff-lines))
;; Any return != 0 && != 1 indicates some level of error.
(t
(error "(diff returned unsuccessfully %s%s)" status stderr))))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All errors should use the "clang-format: " prefix, otherwise it can be difficult to track down errors when these functions are called indirectly.

@goldsteinn
Copy link
Contributor Author

``

Works well on Linux, generally seems fine although I still think it's worth considering splitting out "getting-changes" into a separate shared code-base to avoid dealing with details of calling diff for different version control system here.

Think lukel97 also tested on MacOS.

I see where you're coming from regarding the split, although still feel quite strongly that features that don't make it into the mainline project have a high risk of becoming derelict.

@goldsteinn
Copy link
Contributor Author

ping

@goldsteinn
Copy link
Contributor Author

@ideasman42 ping

@lukel97
Copy link
Contributor

lukel97 commented Feb 4, 2025

Just a heads up I've been using this locally for a bit now and it's been great, thanks for working on this. Haven't run into any issues so far.

@goldsteinn goldsteinn merged commit 32be90d into llvm:main Feb 4, 2025
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 4, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-4 while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/14086

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Analysis/live-stmts.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/clang -cc1 -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/build/lib/clang/21/include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -w -analyzer-checker=debug.DumpLiveExprs /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp 2>&1   | /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/clang -cc1 -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/build/lib/clang/21/include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -w -analyzer-checker=debug.DumpLiveExprs /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp
�[1m/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp:239:16: �[0m�[0;1;31merror: �[0m�[1mCHECK-EMPTY: is not on the line after the previous match
�[0m// CHECK-EMPTY:
�[0;1;32m               ^
�[0m�[1m<stdin>:180:1: �[0m�[0;1;30mnote: �[0m�[1m'next' match was here
�[0m
�[0;1;32m^
�[0m�[1m<stdin>:177:1: �[0m�[0;1;30mnote: �[0m�[1mprevious match ended here
�[0m
�[0;1;32m^
�[0m�[1m<stdin>:178:1: �[0m�[0;1;30mnote: �[0m�[1mnon-matching line after previous match is here
�[0mImplicitCastExpr 0x11a81cd78 '_Bool' <LValueToRValue>
�[0;1;32m^
�[0m
Input file: <stdin>
Check file: /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m           1: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m           2: �[0m�[1m�[0;1;46m�[0m[ B0 (live expressions at block exit) ]�[0;1;46m �[0m
�[0;1;32mcheck:21      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           3: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:22      ^
�[0m�[0;1;30m           4: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:23      ^
�[0m�[0;1;30m           5: �[0m�[1m�[0;1;46m�[0m[ B1 (live expressions at block exit) ]�[0;1;46m �[0m
�[0;1;32mcheck:24      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           6: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:25      ^
�[0m�[0;1;30m           7: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:26      ^
�[0m�[0;1;30m           8: �[0m�[1m�[0;1;46m�[0m[ B2 (live expressions at block exit) ]�[0;1;46m �[0m
�[0;1;32mcheck:27      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           9: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:28      ^
�[0m�[0;1;30m          10: �[0m�[1m�[0;1;46m�[0mDeclRefExpr 0x10d85eee0 'int' lvalue ParmVar 0x10d841a70 'y' 'int'�[0;1;46m �[0m
�[0;1;32mnext:29       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m          11: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:30      ^
�[0m�[0;1;30m          12: �[0m�[1m�[0;1;46m�[0mDeclRefExpr 0x10d85ef00 'int' lvalue ParmVar 0x10d841af0 'z' 'int'�[0;1;46m �[0m
...

Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…vm#112792)

New proposed function `clang-format-vc-diff`.

It is the same as calling `clang-format-region` on all diffs between
the content of a buffer-file and the content of the file at git
revision HEAD. This is essentially the same thing as:
    `git-clang-format -f {filename}`
If the current buffer is saved.

The motivation is many project (LLVM included) both have code that is
non-compliant with there clang-format style and disallow unrelated
format diffs in PRs. This means users can't just run
`clang-format-buffer` on the buffer they are working on, and need to
manually go through all the regions by hand to get them
formatted. This is both an error prone and annoying workflow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants