Skip to content

[CIR] [RFC][NFC]: Reformatting ClangIR identifier naming to follow clang-tidy to streamline development and better clang-tidy diagnostics #1912

@badumbatish

Description

@badumbatish

Hi everyone, I’m opening this RFC NFC to propose a new approach to enforcing coding guideline style in the ClangIR Incubator.

Current situation

The coding style in ClangIR is current split between two different styles: LLVM and MLIR due to part of the code being written in LLVM style (a long time ago), and some other code being written in MLIR. Therefore, the style of the newly contributed code depends largely on a developer’s best judgement and what style the surrounding code is.

While this gives more flexibility to the developers, it eats up reviewer’s time and mental efforts, as well as inundate clang-tidy’s ability to guide developers to more helpful diagnostics.

Sometimes, despite the guideline that says “consistent with other nearby code, whatever that convention happens to be”, there are still two different styles in the same file. This is because a reviewer’s through github UI doesn’t get to see a file’s whole content to decide if the style is being followed or not, and the code contributor might accidentally misremember a certain style that a file in a specific folder must follow.

For example, this is the current guideline from the website

Method and class names should just follow LLVM, but for the variable names one should follow the code structure:
- clang/include -> LLVM
- lib/Driver/* -> LLVM
- lib/FrontendTool -> LLVM
- lib/CIR/FrontendAction -> LLVM
- lib/CIR/CodeGen -> MLIR
- lib/CIR/* -> MLIR

Proposal

With .clang-tidy files declared in each respective folder that we wish to follow the rules to, I propose that we start utilizing clang-tidy’s automatic checking and fixing of the coding style.

I first list the capabilities and how to perform the check and fixes and after that, the advantages and disadvantages.

Running clang-tidy

I’ll be running clang-tidy through the run-clang-tidy driver in ./clang-tools-extra/clang-tidy/tool/.

run-clang-tidy accepts the following:

  • a build folder to find compilation database (build)
  • a folder for it to check and fix (clang/lib/CIR)
  • checks to run (readability-identifier-naming)
  • whether to fix and format the code after checking or not (via --fix and --format)

To conclude, this is the command to run

run-clang-tidy -p build clang/lib/CIR -checks="-*,readability-identifier-naming" -fix -format  -j 

Advantages

Below lists a few advantage to this:
Massive automatic checking and fixing of the style in the current codebase that adheres to the guideline.
clang-tidy can adhere to and keep track of the guidelines without developer’ subjective taste.
We can set up a CI/CD for it to ensure no more incoherent style and style mixing, saving both reviewers and contributor time.
We can elect to format smaller folders first, splitting up a potentially big PR to smaller PRs to ensure nothing’s missed. Once we have formatted a small folder, the new CI/CD can be changed to force coding style to that specific folder instead of all folders in clang/lib/CIR, making it manageable to migrate to the new style.

Disadvantages

Disadvantages are the following:

  • Longer CI time: running time run-clang-tidy -p build clang/lib/CIR -checks="-*,readability-identifier-naming" on Mac M4 Pro takes
run-clang-tidy -p build clang/lib/CIR   506.79s user 24.51s system 1293% cpu 41.085 total
  • Disruptions to current PRs: after the formatting changes are merged, new PRs would have to be redone to adhere to the new style.
  • Disruptions to upstream: doing this would require effort to push to upstream these changes. I’m not quite involve in the upstream process but this is something to be concerned about.
  • clang-tidy isn’t smart, if there is two existing variable “foo” and “Foo”, automatic fixing will require manual renaming either of the variables to something different. Fortunately, there isn’t many instance such as this.
  • Sometimes in some source folder, .clang-tidy file dictates we use CamelCase for variables, which coincides with a class naming of some include files, in this case we'd have to append "class" to said type
Image

Steps

I would run run-clang-tidy checks on a small folder, and open a PR for it with the CI/CD check of style for that folder only.

As I open more and more PR for newer folders, new folders' path would be added to the CI/CD check.

Once we're all done with folders, we can change the docs in https://llvm.github.io/clangir/GettingStarted/coding-guideline.html, advising new contributors to rely on clang-tidy's automatic checking.

I'd need to research where to put the new CI/CD in or should I make a new file and such as I don't have much exp with llvm and clangir's CI/CD

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions