Skip to content

Conversation

@zhjwpku
Copy link
Collaborator

@zhjwpku zhjwpku commented Jan 16, 2025

ci: add clang-tidy check

This PR also changed cpp-linter's tidy-checks option to reduce misunderstanding,
Rely on a .clang-tidy config file by specifying as a blank string (''), see [0] for details.

[0] https://cpp-linter.github.io/cpp-linter-action/inputs-outputs/#tidy-checks

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Jan 16, 2025

This is the outcome of @lidavidm 's question whether we should use lowerCamelCase or UpperCamelCase for method names. I referenced ORC and Arrow's clang-tidy configs.

If a PR want to commit some functions not following conventions, cpp-linter-action will report suggestions, see zhjwpku#2 as a example.

@lidavidm
Copy link
Member

More generally: is there a particular style guide (e.g. Google, LLVM, ...) that we want to adopt?

@wgtmac
Copy link
Member

wgtmac commented Jan 17, 2025

Should we just follow what Arrow does and adjust it as needed? I think it is based on the Google coding style with some modifications and some contributors are already familiar with it.

I have also checked that Impala has derived from the Google style as well: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536. @gaborkaszab WDYT?

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Jan 17, 2025

More generally: is there a particular style guide (e.g. Google, LLVM, ...) that we want to adopt?

I didn't find such one line setting, this PR sets lots of readability-identifier-naming rules, some might not be reasonable, we can remove some if someone knows it's wrong.

@lidavidm
Copy link
Member

Should we just follow what Arrow does and adjust it as needed? I think it is based on the Google coding style with some modifications and some contributors are already familiar with it.

I have also checked that Impala has derived from the Google style as well: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536. @gaborkaszab WDYT?

IIRC the one complication is that Arrow follows a bit of an old version of Google style (notably: Arrow still prefers T* over T& while Google style now prefers the latter)

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Jan 17, 2025

Should we just follow what Arrow does and adjust it as needed? I think it is based on the Google coding style with some modifications and some contributors are already familiar with it.

I have also checked that Impala has derived from the Google style as well: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536. @gaborkaszab WDYT?

I do notice that there is a google-objc-function-naming[1] check, but it seems does report the anomaly(see [2]), i.e. printPuffin not reported.

[1] https://releases.llvm.org/12.0.0/tools/clang/tools/extra/docs/clang-tidy/checks/google-objc-function-naming.html
[2] https://github.com/zhjwpku/iceberg-cpp/pull/2/files

@lidavidm
Copy link
Member

As the docs state, that's for Objective-C, so I wouldn't expect it to help

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Jan 17, 2025

As the docs state, that's for Objective-C, so I wouldn't expect it to help

Yeah, I did not notice that ;(

@gaborkaszab
Copy link
Collaborator

Should we just follow what Arrow does and adjust it as needed? I think it is based on the Google coding style with some modifications and some contributors are already familiar with it.

I have also checked that Impala has derived from the Google style as well: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536. @gaborkaszab WDYT?

Yeah, well I'm not sure how much of the Goggle style we keep in Impala, for me any convention is fine as long as it's well defined and enforced to everyone. I guess Arrow is well adopted enough to follow that, but frankly I have no experience with that so I'll leave this for people involved with that project.

@pitrou
Copy link
Member

pitrou commented Jan 20, 2025

I don't think we've had any major issues with our style conventions on Arrow C++.

The only annoyance I have personally is the ambiguity around method naming, between CamelCase for most methods and snake_case for "cheap" accessors. I would rather a single method naming (and, coming from Python, I generally like snake_case the best, but that's just personal :) ).

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Jan 20, 2025

I don't think we've had any major issues with our style conventions on Arrow C++.

The only annoyance I have personally is the ambiguity around method naming, between CamelCase for most methods and snake_case for "cheap" accessors. I would rather a single method naming (and, coming from Python, I generally like snake_case the best, but that's just personal :) ).

I'm wondering if the Arrow style enforced by CI rules or just committers enforce it while reviewing PRs? It seems not easy for a bot to tell whether a method is "cheap" or not, I might be wrong though.

@pitrou
Copy link
Member

pitrou commented Jan 20, 2025

Most of it is enforced by committers on PRs. It would probably be nice to have more automated enforcement, though.

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Jan 20, 2025

I found a clang-tidy readability-identifier-naming for Google's naming convention [1], the gist has 7 revisions and seems quite popular(18 stars), does it make sense we adopt it?

[1] https://gist.github.com/airglow923/1fa3bda42f2b193920d7f46ee8345e04

@zhjwpku zhjwpku force-pushed the add_clang_tidy_checks branch from 8409c64 to 64a3d57 Compare January 23, 2025 14:47
@zhjwpku zhjwpku force-pushed the add_clang_tidy_checks branch from 64a3d57 to 1f122f0 Compare January 23, 2025 14:49
@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Jan 23, 2025

As @lidavidm is working on the type system, we seems to prefer Arrow-style, so I'm proposing we adopt Arrow's clang-tidy.

There are two points I want to mention here:

  • I did not include -modernize-avoid-c-arrays, we can add this later when we really want the c array style
  • -modernize-use-nodiscard is include here, because we don't add [[nodiscard]] before the existing functions, I notice @lidavidm uses [[nodiscard]] in the type system code, so I'm wondering should we change all functions to this style?

@wgtmac @lidavidm @pitrou what do you think?

@lidavidm
Copy link
Member

Sounds good to me.

Signed-off-by: Junwang Zhao <[email protected]>
@wgtmac
Copy link
Member

wgtmac commented Jan 24, 2025

TBH, I'm not in favor of adding [[nodiscard]] every where. It is boring to type it for every function and make the function signature too verbose. How about only adding [[nodiscard]] to functions that will lead to fatal error if return values are discarded?

See https://stackoverflow.com/questions/77242978/what-are-the-common-practices-and-considerations-for-using-the-nodiscard-att

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Jan 24, 2025

TBH, I'm not in favor of adding [[nodiscard]] every where. It is boring to type it for every function and make the function signature too verbose. How about only adding [[nodiscard]] to functions that will lead to fatal error if return values are discarded?

Make sense, maybe we can depend on -Wunused-result(which is default) to let compiler warn if function return value is not used. As for the user interfaces, I think we should add [[nodiscard]]. @lidavidm thoughts?

See https://stackoverflow.com/questions/77242978/what-are-the-common-practices-and-considerations-for-using-the-nodiscard-att

@lidavidm
Copy link
Member

If we are going to use a Result/Status object, then I think we should have [[nodiscard]] enforcement. If we are going to use exceptions then I don't feel so strongly.

@wgtmac
Copy link
Member

wgtmac commented Jan 24, 2025

If we are going to use a Result/Status object, then I think we should have [[nodiscard]] enforcement. If we are going to use exceptions then I don't feel so strongly.

Then we can declare class [[nodiscard]] Result to achieve this.

@lidavidm
Copy link
Member

Ah, interesting. (I am used to the std::expected backport which doesn't do this, so I always put it by hand everywhere...)

Anyways, I don't feel strongly about this. If you want to remove nodiscard then let's just proceed, I would rather have some static checks than none.

@wgtmac
Copy link
Member

wgtmac commented Jan 24, 2025

I don't feel strong on this. We can always enforce static check on functions marked as [[nodiscard]]. If we eventually go with the std::expected approach, we have to either use a wrapped expected class or manually add [[nodiscard]] to all functions returning std::expected.

@wgtmac
Copy link
Member

wgtmac commented Jan 24, 2025

nit: change the PR title to ci: add clang-tidy check

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never configured the cpp-linter action previously but don't we have to update the tidy-checks option to an empty string in order to use that config file? From the docs:

It is also possible to rely solely on a .clang-tidy config file by specifying this option as a blank string ('').

My question is if this is currently being exercised and enforced on our linter CI step

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Jan 24, 2025

I have never configured the cpp-linter action previously but don't we have to update the tidy-checks option to an empty string in order to use that config file? From the docs:

It is also possible to rely solely on a .clang-tidy config file by specifying this option as a blank string ('').

My question is if this is currently being exercised and enforced on our linter CI step

I checked that while working on this PR, both '' and file works.

@raulcd
Copy link
Member

raulcd commented Jan 24, 2025

I checked that while working on this PR, both '' and file works.

Thanks for confirming, I didn't see the file documented and was wondering why we had it.

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Jan 24, 2025

I checked that while working on this PR, both '' and file works.

Thanks for confirming, I didn't see the file documented and was wondering why we had it.

Let's change file to '' in case future misunderstanding.

Rely on a .clang-tidy config file by specifying `tidy-checks` option
as a blank string (''), see [0] for details.

[0] https://cpp-linter.github.io/cpp-linter-action/inputs-outputs/#tidy-checks

Signed-off-by: Junwang Zhao <[email protected]>
@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Jan 24, 2025

Hi @Xuanwo @Fokko , I think this PR is ready to merge, please take a look when you got the time.

@pitrou
Copy link
Member

pitrou commented Jan 24, 2025

TBH, I'm not in favor of adding [[nodiscard]] every where. It is boring to type it for every function and make the function signature too verbose. How about only adding [[nodiscard]] to functions that will lead to fatal error if return values are discarded?

I also think that it does not make sense to add [[nodiscard]] everywhere. It should only be used when ignoring the return value would be an error, for example if the return value itself is an error code :)

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @zhjwpku for working on this, and thanks @lidavidm @wgtmac @raulcd for the review 🚀

@Fokko Fokko merged commit 7e08c33 into apache:main Jan 30, 2025
6 checks passed
@zhjwpku zhjwpku deleted the add_clang_tidy_checks branch February 1, 2025 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants