Skip to content

Conversation

GantaRoja
Copy link
Collaborator

#2489 Added a new function and test case for mismatched title and heading

@qiskit-bot
Copy link
Contributor

Thanks for contributing to Qiskit documentation!

Before your PR can be merged, it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. Thanks! 🙌

Copy link
Member

@frankharkins frankharkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@frankharkins frankharkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Thanks for your work on this

Copy link
Member

@frankharkins frankharkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is looking really good 🎉

Aside from a few small things, there's one change I think we should make before merging this:

At the moment, both collectInvalidImageErrors and collectHeadingTitleMismatch accept a markdown string, then parse that string into an abstract syntax tree. The parsing step is relatively slow, so I think we should pull that out of the functions so we only parse the tree once.

To do this, you'd move the parsing step to L92 of checkMarkdown.ts, then change the two "collect..." functions to accept a tree: Root rather than markdown: string. Then tweak collectInvalidImageErrors so it re-uses the same tree, rather than parsing again.

Copy link
Member

@frankharkins frankharkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thanks again

Comment on lines 20 to 26
function parseMarkdown(markdown: string): Root {
return unified()
.use(remarkParse)
.use(remarkGfm)
.use(remarkFrontmatter, ["yaml"])
.parse(markdown);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice helper function, what do you think about moving it to a new file (e.g. scripts/js/lib/markdownUtils.ts) and re-using it in checkMarkdown.ts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still seems relevant

Copy link
Collaborator

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Great work. I really like the approach of combining checks for better performance.

Comment on lines 20 to 26
function parseMarkdown(markdown: string): Root {
return unified()
.use(remarkParse)
.use(remarkGfm)
.use(remarkFrontmatter, ["yaml"])
.parse(markdown);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still seems relevant

Copy link
Collaborator

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

check Outdated
@@ -25,7 +25,7 @@ CHECKS = {
"metadata": ["npm", "run", "check:metadata"],
"patterns index pages": ["npm", "run", "check:patterns-index"],
"tutorials index page": ["python3", "scripts/ci/check-tutorials-index.py"],
"images": ["npm", "run", "check:images"],
"images": ["npm", "run", "check:markdown"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"images": ["npm", "run", "check:markdown"],
"markdown": ["npm", "run", "check:markdown"],

Comment on lines 17 to 18
// Helper to recursively extract visible text from heading node
function extractText(node: any): string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please call this extractHeadingText? And the parameter headingNode. Then you can remove the comment. We prefer "self-documenting code"

It'd be helpful to move this to markdownUtils.ts because we may want to use it in the future in other places. This is pretty generic code

return node.children.map(extractText).join(" ");
}

return "";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than returning an empty string, it's better to throw an error: new Error(`Could not parse heading node: ${node}`). This is "defensive programming" - our assumptions will have been invalidated that there is an edge case we haven't thought about. We want to know that right away.

It is safe to error here because these are developer productivity scripts; we are not breaking users of IBM Quantum, only developers of these docs.

});

// Compare and collect mismatch
if (frontmatterTitle && headingText && frontmatterTitle !== headingText) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in the future, we will want to error if the frontmatterTitle or headingText are missing. Currently, check:metadata already does that. We'll consolidate the checks in a follow up.

import { collectHeadingTitleMismatch } from "./markdownTitles";
import { parseMarkdown } from "./markdownUtils";

test("Test for matching titles and headings", async () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because all these tests are for the same function, it's more conventional to group the tests all together by using test.describe(), like this

test.describe("Qiskit ToC mirrors index page sections", () => {
test("validate assumptions", () => {
validateTopLevelModuleAssumptions();
});
test("dev", async () => {
await checkFolder("/dev");
});
test("latest", async () => {
await checkFolder("");
});
test("historical releases (1.1+)", async () => {
const folders = (
await readdir("docs/api/qiskit", { withFileTypes: true })
).filter(
(file) =>
file.isDirectory() && file.name.match(/[0-9].*/) && +file.name >= 1.1,
);
for (const folder of folders) {
await checkFolder(`/${folder.name}`);
}

Here, you could put in the description for test.describe() the string "collectHeadingTitleMismatch"). Then, name the test() cases as:

  • "valid"
  • "mismatched - simple h1"
  • "mismatched - complex h1"

Furthermore, there is a lot of duplication between the tests. It is conventional to DRY (Don't Repeat Yourself) this, i.e. to deduplicate it. You can define a helper function:

const assert = (markdown: string, expected: Set<string>): void => {
    const tree = parseMarkdown(markdown);
    const result = await collectHeadingTitleMismatch(tree);
    expect(result).toEqual(expected)
}


export async function collectHeadingTitleMismatch(
tree: Root,
): Promise<Set<string>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember why Frank had you return a Set<string> rather than string | null?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I imagine it comes from copying the image code, right? That makes sense with images because there can be >1 error per file.

It'd be more accurate for this to return string | null. However, after looking at the checkMarkdown.ts script, that would make the code much more annoying to deal with. So, let's stick with this current implementation.

let headingText: string | undefined;

// Extract frontmatter title
visit(tree, "yaml", (node: any) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized this isn't going to work with Jupyter notebooks. They set their markdown differently. We'll need to figure out a new approach. We can pair program on this

@Eric-Arellano
Copy link
Collaborator

Eric-Arellano commented Aug 27, 2025

Hey Roja, I thought of how to get this working with Jupyter notebooks. In markdownReader.ts, define a new asynchronous function readMarkdownAndMetadata with the same arguments as readMarkdown. Return Promise<{content: string; metadata: Record<string, string>}. The implementation should look like readMarkdown, but you need to enhance it to also parse metadata by using the techniques in this function with grayMatter() and json.metadata:

const readMetadata = async (filePath: string): Promise<Record<string, any>> => {
const ext = filePath.split(".").pop();
if (ext === "md" || ext === "mdx") {
const content = await fs.readFile(filePath, "utf-8");
return grayMatter(content).data;
} else if (ext === "ipynb") {
const json = await readJsonFile(filePath);
return json.metadata;
} else {
throw new Error(`Unknown extension for ${filePath}: ${ext}`);
}
};

(Why do we want to combine markdown reading and metadata reading into one function? We want to reuse the readFile() call so that we are not reading the raw file twice. This should help with performance. Filesystem operations are expensive, along with buffering the whole file in memory.)

You're then going to want to have checkMarkdown.ts call readMarkdownAndMetadata. collectInvalidImageErrors should still stay the same, but collectHeadingTitleMismatch is going to change that it needs both the parsed markdown tree & the metadata as an argument. Update collectHeadingTitleMismatch to simply read metadata.title, and remove the yaml parsing. Update the unit tests.

Finally, refactor checkMetadata.ts to use your new readMarkdownAndMetadata function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants