-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add a doc_comments_missing_terminal_punctuation
lint
#15758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 7 commits
1025af5
88bb926
39cac5e
6d083ea
ec5aa66
70a7dd6
5ca9ca1
0df651d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,88 @@ | ||||||
use rustc_errors::Applicability; | ||||||
use rustc_lint::LateContext; | ||||||
use rustc_resolve::rustdoc::main_body_opts; | ||||||
|
||||||
use pulldown_cmark::{Event, Options, Parser, Tag, TagEnd}; | ||||||
|
||||||
use super::{DOC_COMMENTS_MISSING_TERMINAL_PUNCTUATION, Fragments}; | ||||||
|
||||||
const MSG: &str = "doc comments should end with a terminal punctuation mark"; | ||||||
const PUNCTUATION_SUGGESTION: char = '.'; | ||||||
|
||||||
pub fn check(cx: &LateContext<'_>, doc: &str, fragments: Fragments<'_>) { | ||||||
// This ignores `#[doc]` attributes, which we do not handle. | ||||||
if let Some(offset) = is_missing_punctuation(doc) | ||||||
&& let Some(span) = fragments.span(cx, offset..offset) | ||||||
{ | ||||||
clippy_utils::diagnostics::span_lint_and_sugg( | ||||||
cx, | ||||||
DOC_COMMENTS_MISSING_TERMINAL_PUNCTUATION, | ||||||
span, | ||||||
MSG, | ||||||
"end the doc comment with some punctuation", | ||||||
PUNCTUATION_SUGGESTION.to_string(), | ||||||
Applicability::MaybeIncorrect, | ||||||
); | ||||||
} | ||||||
} | ||||||
|
||||||
#[must_use] | ||||||
/// If punctuation is missing, returns the offset where new punctuation should be inserted. | ||||||
fn is_missing_punctuation(doc_string: &str) -> Option<usize> { | ||||||
const TERMINAL_PUNCTUATION_MARKS: &[char] = &['.', '?', '!', '…']; | ||||||
|
||||||
let mut no_report_depth = 0; | ||||||
let mut text_offset = None; | ||||||
for (event, offset) in | ||||||
Parser::new_ext(doc_string, main_body_opts() - Options::ENABLE_SMART_PUNCTUATION).into_offset_iter() | ||||||
Comment on lines
+52
to
+53
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Double parsing every doc comment is not ideal, could this be tweaked into a state machine struct that |
||||||
{ | ||||||
match event { | ||||||
Event::Start( | ||||||
Tag::CodeBlock(..) | ||||||
| Tag::FootnoteDefinition(_) | ||||||
| Tag::Heading { .. } | ||||||
| Tag::HtmlBlock | ||||||
| Tag::List(..) | ||||||
| Tag::Table(_), | ||||||
) => { | ||||||
no_report_depth += 1; | ||||||
}, | ||||||
Event::End(TagEnd::FootnoteDefinition) => { | ||||||
no_report_depth -= 1; | ||||||
}, | ||||||
Event::End( | ||||||
TagEnd::CodeBlock | TagEnd::Heading(_) | TagEnd::HtmlBlock | TagEnd::List(_) | TagEnd::Table, | ||||||
) => { | ||||||
no_report_depth -= 1; | ||||||
text_offset = None; | ||||||
|
||||||
}, | ||||||
Event::InlineHtml(_) | Event::Start(Tag::Image { .. }) | Event::End(TagEnd::Image) => { | ||||||
text_offset = None; | ||||||
}, | ||||||
Event::Code(..) | Event::Start(Tag::Link { .. }) | Event::End(TagEnd::Link) | ||||||
if no_report_depth == 0 && !offset.is_empty() => | ||||||
{ | ||||||
text_offset = Some(offset.end); | ||||||
}, | ||||||
Event::Text(..) if no_report_depth == 0 && !offset.is_empty() => { | ||||||
// American-style quotes require punctuation to be placed inside closing quotation marks. | ||||||
if doc_string[..offset.end].trim_end().ends_with('"') { | ||||||
|
if doc_string[..offset.end].trim_end().ends_with('"') { | |
if doc_string[..offset.end].trim_end().ends_with(|c| c == '"' || c == ')') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but that means that the fix suggestion would always add a period inside the parentheses, while it is more likely that adding it outside would be the correct choice. That makes me realize that the suggestion for quotes would also arbitrarily favor the American style over the British one.
Maybe we should treat these cases (quotes and parentheses) as unfixable: detect when terminal punctuation is missing but give no suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I think we are overfocusing on edge cases right now: this lint (even without handling quotes and parentheses) would already be really useful for rolling out on new projects to ensure a consistent style, and we know that we will not be able to handle every little edge case without full-blown NLP anyway. These edge cases can easily be circumvented by simply slightly rephrasing (and I understand that some people do not like tools dictating how they write, but in this case this seems pretty minor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should treat these cases (quotes and parentheses) as unfixable: detect when terminal punctuation is missing but give no suggestions?
Yes, I agree. That would be great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I think we are overfocusing on edge cases right now: this lint (even without handling quotes and parentheses) would already be really useful for rolling out on new projects to ensure a consistent style, and we know that we will not be able to handle every little edge case without full-blown NLP anyway.
It's certainly true that we can't reach perfection without full AI. When we inevitably err, though, I'd prefer to err on the side of false negatives instead of false positives. We could even solve the parens and quotes problem by adding them as accepted terminating punctuation:
const TERMINAL_PUNCTUATION_MARKS: &[char] = &['.', '?', '!', '…', ')', '"', '\''];
The point is that we don't want to make people #[allow]
the lint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed an update that does the following:
- Consider both
.)
and).
to be correct (thus there could be false negatives in
the former case, but nothing we can do about it). - Consider both
."
and".
to be correct (same). - Otherwise the lint triggers and offers help, but does not suggest a fix.
The implementation could likely use some refactoring, but we should agree on the
behavior before that.
I did not modify TERMINAL_PUNCTUATION_MARKS
because that was causing issues with Markdown links, which also end with a parenthesis. I also did not do anything about single quotation marks, because my understanding is that those are used in British English and I would expect American style not be used in that case (and therefore no special treatment is needed).
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,169 @@ | ||
#![feature(custom_inner_attributes)] | ||
#![rustfmt::skip] | ||
#![warn(clippy::doc_comments_missing_terminal_punctuation)] | ||
|
||
/// Returns the Answer to the Ultimate Question of Life, the Universe, and Everything. | ||
//~^ doc_comments_missing_terminal_punctuation | ||
fn answer() -> i32 { | ||
42 | ||
} | ||
|
||
/// The `Option` type. | ||
//~^ doc_comments_missing_terminal_punctuation | ||
// Triggers even in the presence of another attribute. | ||
#[derive(Debug)] | ||
enum MyOption<T> { | ||
/// No value. | ||
//~^ doc_comments_missing_terminal_punctuation | ||
None, | ||
/// Some value of type `T`. | ||
Some(T), | ||
} | ||
|
||
// Triggers correctly even when interleaved with other attributes. | ||
/// A multiline | ||
#[derive(Debug)] | ||
/// doc comment: | ||
/// only the last line triggers the lint. | ||
//~^ doc_comments_missing_terminal_punctuation | ||
enum Exceptions { | ||
/// Question marks are fine? | ||
QuestionMark, | ||
/// Exclamation marks are fine! | ||
ExclamationMark, | ||
/// Ellipses are ok too… | ||
Ellipsis, | ||
/// HTML content is however not checked: | ||
/// <em>Raw HTML is allowed as well</em> | ||
RawHtml, | ||
/// The raw HTML exception actually does the right thing to autolinks: | ||
/// <https://spec.commonmark.org/0.31.2/#autolinks>. | ||
//~^ doc_comments_missing_terminal_punctuation | ||
MarkdownAutolink, | ||
/// This table introduction ends with a colon: | ||
/// | ||
/// | Exception | Note | | ||
/// | -------------- | ----- | | ||
/// | Markdown table | A-ok | | ||
MarkdownTable, | ||
/// Here is a snippet | ||
/// | ||
/// ``` | ||
/// // Code blocks are no issues. | ||
/// ``` | ||
CodeBlock, | ||
} | ||
|
||
// Check the lint can be expected on a whole enum at once. | ||
#[expect(clippy::doc_comments_missing_terminal_punctuation)] | ||
enum Char { | ||
/// U+0000 | ||
Null, | ||
/// U+0001 | ||
StartOfHeading, | ||
} | ||
|
||
// Check the lint can be expected on a single variant without affecting others. | ||
enum Char2 { | ||
#[expect(clippy::doc_comments_missing_terminal_punctuation)] | ||
/// U+0000 | ||
Null, | ||
/// U+0001. | ||
//~^ doc_comments_missing_terminal_punctuation | ||
StartOfHeading, | ||
} | ||
|
||
mod module { | ||
//! Works on | ||
//! inner attributes too. | ||
//~^ doc_comments_missing_terminal_punctuation | ||
} | ||
|
||
enum Trailers { | ||
/// Sometimes the doc comment ends with parentheses (like this). | ||
//~^ doc_comments_missing_terminal_punctuation | ||
EndsWithParens, | ||
/// (Sometimes the last sentence is in parentheses, but there is no special treatment of this.). | ||
//~^ doc_comments_missing_terminal_punctuation | ||
ParensFailing, | ||
/// **Sometimes the last sentence is in bold, and that's ok.** | ||
DoubleStarPassing, | ||
/// **But sometimes it is missing a period.** | ||
//~^ doc_comments_missing_terminal_punctuation | ||
DoubleStarFailing, | ||
/// _Sometimes the last sentence is in italics, and that's ok._ | ||
UnderscorePassing, | ||
/// _But sometimes it is missing a period._ | ||
//~^ doc_comments_missing_terminal_punctuation | ||
UnderscoreFailing, | ||
/// This comment ends with "a quote." | ||
QuotePassing, | ||
/// This comment ends with "a quote." | ||
//~^ doc_comments_missing_terminal_punctuation | ||
QuoteFailing, | ||
} | ||
|
||
/// Doc comments can end with an [inline link](#anchor). | ||
//~^ doc_comments_missing_terminal_punctuation | ||
struct InlineLink; | ||
|
||
/// Some doc comments contain [link reference definitions][spec]. | ||
//~^ doc_comments_missing_terminal_punctuation | ||
/// | ||
/// [spec]: https://spec.commonmark.org/0.31.2/#link-reference-definitions | ||
struct LinkRefDefinition; | ||
|
||
// List items do not always need to end with a period. | ||
enum UnorderedLists { | ||
/// This list has an introductory sentence: | ||
/// | ||
/// - A list item | ||
Dash, | ||
/// + A list item | ||
Plus, | ||
/// * A list item | ||
Star, | ||
} | ||
|
||
enum OrderedLists { | ||
/// 1. A list item | ||
Dot, | ||
/// 42) A list item | ||
Paren, | ||
} | ||
|
||
/// Doc comments with trailing blank lines are supported. | ||
//~^ doc_comments_missing_terminal_punctuation | ||
/// | ||
struct TrailingBlankLine; | ||
|
||
/// The first paragraph is not checked | ||
/// | ||
/// Other sentences are not either | ||
/// Only the last sentence is. | ||
//~^ doc_comments_missing_terminal_punctuation | ||
struct OnlyLastSentence; | ||
|
||
/// ``` | ||
struct IncompleteBlockCode; | ||
|
||
/// This ends with a code `span`. | ||
//~^ doc_comments_missing_terminal_punctuation | ||
struct CodeSpan; | ||
|
||
#[expect(clippy::empty_docs)] | ||
/// | ||
struct EmptyDocComment; | ||
|
||
/** | ||
* Block doc comments work. | ||
* | ||
*/ | ||
//~^^^ doc_comments_missing_terminal_punctuation | ||
struct BlockDocComment; | ||
|
||
/// Sometimes a doc attribute is used for concatenation | ||
/// ``` | ||
#[doc = ""] | ||
/// ``` | ||
struct DocAttribute; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding catching individual paragraphs, would it suffice to track the unconditional depth of start/end tags and check when it goes from 1 -> 0 due to a
TagEnd::Paragraph
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the "rules" for what makes sense in the middle of a comment are different from the rules that make sense at the end. For example, here's a few crates where a paragraph ends in a colon:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that being a problem, it would no longer lint on a final paragraph ending in
:
but that doesn't seem like it would be common enough to need a lint