-
Notifications
You must be signed in to change notification settings - Fork 129
Add check that page title is in sync with ToC, h1, and metadata #3669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for contributing to Qiskit documentation! Before your PR can be merged, it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. Thanks! 🙌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Thanks for your work on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This is looking really good 🎉
Aside from a few small things, there's one change I think we should make before merging this:
At the moment, both collectInvalidImageErrors
and collectHeadingTitleMismatch
accept a markdown string, then parse that string into an abstract syntax tree. The parsing step is relatively slow, so I think we should pull that out of the functions so we only parse the tree once.
To do this, you'd move the parsing step to L92 of checkMarkdown.ts
, then change the two "collect..." functions to accept a tree: Root
rather than markdown: string
. Then tweak collectInvalidImageErrors
so it re-uses the same tree, rather than parsing again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, thanks again
function parseMarkdown(markdown: string): Root { | ||
return unified() | ||
.use(remarkParse) | ||
.use(remarkGfm) | ||
.use(remarkFrontmatter, ["yaml"]) | ||
.parse(markdown); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice helper function, what do you think about moving it to a new file (e.g. scripts/js/lib/markdownUtils.ts
) and re-using it in checkMarkdown.ts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment still seems relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Great work. I really like the approach of combining checks for better performance.
function parseMarkdown(markdown: string): Root { | ||
return unified() | ||
.use(remarkParse) | ||
.use(remarkGfm) | ||
.use(remarkFrontmatter, ["yaml"]) | ||
.parse(markdown); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment still seems relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
check
Outdated
@@ -25,7 +25,7 @@ CHECKS = { | |||
"metadata": ["npm", "run", "check:metadata"], | |||
"patterns index pages": ["npm", "run", "check:patterns-index"], | |||
"tutorials index page": ["python3", "scripts/ci/check-tutorials-index.py"], | |||
"images": ["npm", "run", "check:images"], | |||
"images": ["npm", "run", "check:markdown"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"images": ["npm", "run", "check:markdown"], | |
"markdown": ["npm", "run", "check:markdown"], |
scripts/js/lib/markdownTitles.ts
Outdated
// Helper to recursively extract visible text from heading node | ||
function extractText(node: any): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please call this extractHeadingText
? And the parameter headingNode
. Then you can remove the comment. We prefer "self-documenting code"
It'd be helpful to move this to markdownUtils.ts
because we may want to use it in the future in other places. This is pretty generic code
scripts/js/lib/markdownTitles.ts
Outdated
return node.children.map(extractText).join(" "); | ||
} | ||
|
||
return ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than returning an empty string, it's better to throw an error: new Error(`Could not parse heading node: ${node}`)
. This is "defensive programming" - our assumptions will have been invalidated that there is an edge case we haven't thought about. We want to know that right away.
It is safe to error here because these are developer productivity scripts; we are not breaking users of IBM Quantum, only developers of these docs.
}); | ||
|
||
// Compare and collect mismatch | ||
if (frontmatterTitle && headingText && frontmatterTitle !== headingText) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, in the future, we will want to error if the frontmatterTitle or headingText are missing. Currently, check:metadata
already does that. We'll consolidate the checks in a follow up.
import { collectHeadingTitleMismatch } from "./markdownTitles"; | ||
import { parseMarkdown } from "./markdownUtils"; | ||
|
||
test("Test for matching titles and headings", async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because all these tests are for the same function, it's more conventional to group the tests all together by using test.describe()
, like this
documentation/scripts/js/lib/api/TocGrouping.test.ts
Lines 145 to 167 in eec1dae
test.describe("Qiskit ToC mirrors index page sections", () => { | |
test("validate assumptions", () => { | |
validateTopLevelModuleAssumptions(); | |
}); | |
test("dev", async () => { | |
await checkFolder("/dev"); | |
}); | |
test("latest", async () => { | |
await checkFolder(""); | |
}); | |
test("historical releases (1.1+)", async () => { | |
const folders = ( | |
await readdir("docs/api/qiskit", { withFileTypes: true }) | |
).filter( | |
(file) => | |
file.isDirectory() && file.name.match(/[0-9].*/) && +file.name >= 1.1, | |
); | |
for (const folder of folders) { | |
await checkFolder(`/${folder.name}`); | |
} |
Here, you could put in the description for test.describe()
the string "collectHeadingTitleMismatch")
. Then, name the test()
cases as:
"valid"
"mismatched - simple h1"
"mismatched - complex h1"
Furthermore, there is a lot of duplication between the tests. It is conventional to DRY (Don't Repeat Yourself) this, i.e. to deduplicate it. You can define a helper function:
const assert = (markdown: string, expected: Set<string>): void => {
const tree = parseMarkdown(markdown);
const result = await collectHeadingTitleMismatch(tree);
expect(result).toEqual(expected)
}
|
||
export async function collectHeadingTitleMismatch( | ||
tree: Root, | ||
): Promise<Set<string>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember why Frank had you return a Set<string>
rather than string | null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I imagine it comes from copying the image code, right? That makes sense with images because there can be >1 error per file.
It'd be more accurate for this to return string | null
. However, after looking at the checkMarkdown.ts
script, that would make the code much more annoying to deal with. So, let's stick with this current implementation.
let headingText: string | undefined; | ||
|
||
// Extract frontmatter title | ||
visit(tree, "yaml", (node: any) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized this isn't going to work with Jupyter notebooks. They set their markdown differently. We'll need to figure out a new approach. We can pair program on this
Hey Roja, I thought of how to get this working with Jupyter notebooks. In documentation/scripts/js/commands/checkMetadata.ts Lines 40 to 51 in 9dc9194
(Why do we want to combine markdown reading and metadata reading into one function? We want to reuse the You're then going to want to have Finally, refactor |
#2489 Added a new function and test case for mismatched title and heading