Add check that page title is in sync with ToC, h1, and metadata #3669

GantaRoja · 2025-08-06T14:49:36Z

#2489 Added a new function and test case for mismatched title and heading

qiskit-bot · 2025-08-06T14:49:46Z

Thanks for contributing to Qiskit documentation!

Before your PR can be merged, it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. Thanks! 🙌

scripts/js/lib/markdownTitles.test.ts

scripts/js/lib/markdownTitles.ts

frankharkins

Thanks!

scripts/js/lib/markdownTitles.ts

frankharkins

Looking good! Thanks for your work on this

scripts/js/commands/checkMarkdown.ts

package.json

frankharkins

Thank you! This is looking really good 🎉

Aside from a few small things, there's one change I think we should make before merging this:

At the moment, both collectInvalidImageErrors and collectHeadingTitleMismatch accept a markdown string, then parse that string into an abstract syntax tree. The parsing step is relatively slow, so I think we should pull that out of the functions so we only parse the tree once.

To do this, you'd move the parsing step to L92 of checkMarkdown.ts, then change the two "collect..." functions to accept a tree: Root rather than markdown: string. Then tweak collectInvalidImageErrors so it re-uses the same tree, rather than parsing again.

scripts/js/lib/markdownTitles.ts

scripts/js/lib/markdownTitles.test.ts

scripts/js/lib/markdownTitles.ts

scripts/js/commands/checkMarkdown.ts

frankharkins

Awesome work, thanks again

frankharkins · 2025-08-12T07:51:12Z

scripts/js/lib/markdownImages.test.ts

+function parseMarkdown(markdown: string): Root {
+  return unified()
+    .use(remarkParse)
+    .use(remarkGfm)
+    .use(remarkFrontmatter, ["yaml"])
+    .parse(markdown);
+}


This is a nice helper function, what do you think about moving it to a new file (e.g. scripts/js/lib/markdownUtils.ts) and re-using it in checkMarkdown.ts?

This comment still seems relevant

scripts/js/commands/checkMarkdown.ts

Eric-Arellano

Awesome! Great work. I really like the approach of combining checks for better performance.

scripts/js/commands/checkMarkdown.ts

Eric-Arellano · 2025-08-12T13:39:56Z

scripts/js/lib/markdownImages.test.ts

+function parseMarkdown(markdown: string): Root {
+  return unified()
+    .use(remarkParse)
+    .use(remarkGfm)
+    .use(remarkFrontmatter, ["yaml"])
+    .parse(markdown);
+}


This comment still seems relevant

Eric-Arellano

Thanks!

Eric-Arellano · 2025-08-18T13:46:10Z

check

@@ -25,7 +25,7 @@ CHECKS = {
    "metadata": ["npm", "run", "check:metadata"],
    "patterns index pages": ["npm", "run", "check:patterns-index"],
    "tutorials index page": ["python3", "scripts/ci/check-tutorials-index.py"],
-    "images": ["npm", "run", "check:images"],
+    "images": ["npm", "run", "check:markdown"],


Suggested change

"images": ["npm", "run", "check:markdown"],

"markdown": ["npm", "run", "check:markdown"],

Eric-Arellano · 2025-08-18T13:49:47Z

scripts/js/lib/markdownTitles.ts

+// Helper to recursively extract visible text from heading node
+function extractText(node: any): string {


Can you please call this extractHeadingText? And the parameter headingNode. Then you can remove the comment. We prefer "self-documenting code"

It'd be helpful to move this to markdownUtils.ts because we may want to use it in the future in other places. This is pretty generic code

Eric-Arellano · 2025-08-18T13:51:30Z

scripts/js/lib/markdownTitles.ts

+    return node.children.map(extractText).join(" ");
+  }
+
+  return "";


Rather than returning an empty string, it's better to throw an error: new Error(`Could not parse heading node: ${node}`). This is "defensive programming" - our assumptions will have been invalidated that there is an edge case we haven't thought about. We want to know that right away.

It is safe to error here because these are developer productivity scripts; we are not breaking users of IBM Quantum, only developers of these docs.

Eric-Arellano · 2025-08-18T13:55:00Z

scripts/js/lib/markdownTitles.ts

+  });
+
+  // Compare and collect mismatch
+  if (frontmatterTitle && headingText && frontmatterTitle !== headingText) {


FYI, in the future, we will want to error if the frontmatterTitle or headingText are missing. Currently, check:metadata already does that. We'll consolidate the checks in a follow up.

Eric-Arellano · 2025-08-18T14:04:36Z

scripts/js/lib/markdownTitles.test.ts

+import { collectHeadingTitleMismatch } from "./markdownTitles";
+import { parseMarkdown } from "./markdownUtils";
+
+test("Test for matching titles and headings", async () => {


Because all these tests are for the same function, it's more conventional to group the tests all together by using test.describe(), like this

documentation/scripts/js/lib/api/TocGrouping.test.ts

Lines 145 to 167 in eec1dae

test.describe("Qiskit ToC mirrors index page sections", () => {

test("validate assumptions", () => {

validateTopLevelModuleAssumptions();

});

test("dev", async () => {

await checkFolder("/dev");

});

test("latest", async () => {

await checkFolder("");

});

test("historical releases (1.1+)", async () => {

const folders = (

await readdir("docs/api/qiskit", { withFileTypes: true })

).filter(

(file) =>

file.isDirectory() && file.name.match(/[0-9].*/) && +file.name >= 1.1,

);

for (const folder of folders) {

await checkFolder(`/${folder.name}`);

}

Here, you could put in the description for test.describe() the string "collectHeadingTitleMismatch"). Then, name the test() cases as:

"valid"

"mismatched - simple h1"

"mismatched - complex h1"

Furthermore, there is a lot of duplication between the tests. It is conventional to DRY (Don't Repeat Yourself) this, i.e. to deduplicate it. You can define a helper function:

const assert = (markdown: string, expected: Set<string>): void => { const tree = parseMarkdown(markdown); const result = await collectHeadingTitleMismatch(tree); expect(result).toEqual(expected) }

Eric-Arellano · 2025-08-18T14:05:16Z

scripts/js/lib/markdownTitles.ts

+
+export async function collectHeadingTitleMismatch(
+  tree: Root,
+): Promise<Set<string>> {


Do you remember why Frank had you return a Set<string> rather than string | null?

Ah, I imagine it comes from copying the image code, right? That makes sense with images because there can be >1 error per file.

It'd be more accurate for this to return string | null. However, after looking at the checkMarkdown.ts script, that would make the code much more annoying to deal with. So, let's stick with this current implementation.

Eric-Arellano · 2025-08-18T21:35:05Z

scripts/js/lib/markdownTitles.ts

+  let headingText: string | undefined;
+
+  // Extract frontmatter title
+  visit(tree, "yaml", (node: any) => {


I realized this isn't going to work with Jupyter notebooks. They set their markdown differently. We'll need to figure out a new approach. We can pair program on this

Eric-Arellano · 2025-08-27T01:44:24Z

Hey Roja, I thought of how to get this working with Jupyter notebooks. In markdownReader.ts, define a new asynchronous function readMarkdownAndMetadata with the same arguments as readMarkdown. Return Promise<{content: string; metadata: Record<string, string>}. The implementation should look like readMarkdown, but you need to enhance it to also parse metadata by using the techniques in this function with grayMatter() and json.metadata:

documentation/scripts/js/commands/checkMetadata.ts

Lines 40 to 51 in 9dc9194

    
           const readMetadata = async (filePath: string): Promise<Record<string, any>> => { 
        
             const ext = filePath.split(".").pop(); 
        
             if (ext === "md" || ext === "mdx") { 
        
               const content = await fs.readFile(filePath, "utf-8"); 
        
               return grayMatter(content).data; 
        
             } else if (ext === "ipynb") { 
        
               const json = await readJsonFile(filePath); 
        
               return json.metadata; 
        
             } else { 
        
               throw new Error(`Unknown extension for ${filePath}: ${ext}`); 
        
             } 
        
           };

(Why do we want to combine markdown reading and metadata reading into one function? We want to reuse the readFile() call so that we are not reading the raw file twice. This should help with performance. Filesystem operations are expensive, along with buffering the whole file in memory.)

You're then going to want to have checkMarkdown.ts call readMarkdownAndMetadata. collectInvalidImageErrors should still stay the same, but collectHeadingTitleMismatch is going to change that it needs both the parsed markdown tree & the metadata as an argument. Update collectHeadingTitleMismatch to simply read metadata.title, and remove the yaml parsing. Update the unit tests.

Finally, refactor checkMetadata.ts to use your new readMarkdownAndMetadata function.

Added a new function and test case for mismatched title and heading

fb15293

GantaRoja requested review from Eric-Arellano and frankharkins August 6, 2025 14:49

github-project-automation bot added this to Docs Planning Aug 6, 2025

frankharkins reviewed Aug 6, 2025

View reviewed changes

scripts/js/lib/markdownTitles.test.ts Outdated Show resolved Hide resolved

scripts/js/lib/markdownTitles.ts Outdated Show resolved Hide resolved

scripts/js/lib/markdownTitles.ts Outdated Show resolved Hide resolved

GantaRoja added 3 commits August 6, 2025 16:13

lint errors fixed

176bc95

separaeted tests removed consoles

9ab781c

lint error fixed

32cfd9d

frankharkins reviewed Aug 7, 2025

View reviewed changes

scripts/js/lib/markdownTitles.ts Show resolved Hide resolved

scripts/js/lib/markdownTitles.ts Outdated Show resolved Hide resolved

called the function and rename of checkimages to checkMarkdoen

5b5fd21

frankharkins reviewed Aug 7, 2025

View reviewed changes

scripts/js/commands/checkMarkdown.ts Outdated Show resolved Hide resolved

scripts/js/commands/checkMarkdown.ts Show resolved Hide resolved

package.json Outdated Show resolved Hide resolved

GantaRoja added 2 commits August 8, 2025 11:46

checkpoint

86b5296

modified check file

4e6ac5f

frankharkins reviewed Aug 8, 2025

View reviewed changes

package.json Outdated Show resolved Hide resolved

GantaRoja added 2 commits August 8, 2025 16:13

corrected packagejson file

94d1e01

ignored files starts with doc/spi

6a9af5b

frankharkins reviewed Aug 11, 2025

View reviewed changes

scripts/js/lib/markdownTitles.ts Outdated Show resolved Hide resolved

scripts/js/lib/markdownTitles.test.ts Outdated Show resolved Hide resolved

scripts/js/lib/markdownTitles.ts Outdated Show resolved Hide resolved

scripts/js/commands/checkMarkdown.ts Outdated Show resolved Hide resolved

moved parsing step out of the function

8012257

frankharkins approved these changes Aug 12, 2025

View reviewed changes

frankharkins reviewed Aug 12, 2025

View reviewed changes

scripts/js/commands/checkMarkdown.ts Outdated Show resolved Hide resolved

frankharkins reviewed Aug 12, 2025

View reviewed changes

scripts/js/commands/checkMarkdown.ts Outdated Show resolved Hide resolved

frankharkins mentioned this pull request Aug 12, 2025

Review title and metadata mismatches #3700

Open

moved helper function to a new file

ee77032

Eric-Arellano reviewed Aug 12, 2025

View reviewed changes

modified few files

cae7d70

Eric-Arellano reviewed Aug 18, 2025

View reviewed changes

Eric-Arellano mentioned this pull request Aug 18, 2025

Merge metadata check into new markdown check #3750

Open

combined test caeses using describe

de37f43

	"images": ["npm", "run", "check:markdown"],
	"markdown": ["npm", "run", "check:markdown"],

		// Helper to recursively extract visible text from heading node
		function extractText(node: any): string {

	test.describe("Qiskit ToC mirrors index page sections", () => {
	test("validate assumptions", () => {
	validateTopLevelModuleAssumptions();
	});

	test("dev", async () => {
	await checkFolder("/dev");
	});

	test("latest", async () => {
	await checkFolder("");
	});

	test("historical releases (1.1+)", async () => {
	const folders = (
	await readdir("docs/api/qiskit", { withFileTypes: true })
	).filter(
	(file) =>
	file.isDirectory() && file.name.match(/[0-9].*/) && +file.name >= 1.1,
	);
	for (const folder of folders) {
	await checkFolder(`/${folder.name}`);
	}

Add check that page title is in sync with ToC, h1, and metadata #3669

Are you sure you want to change the base?

Add check that page title is in sync with ToC, h1, and metadata #3669

Uh oh!

Conversation

GantaRoja commented Aug 6, 2025

Uh oh!

qiskit-bot commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

frankharkins left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

frankharkins left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

frankharkins left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

frankharkins left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Eric-Arellano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Eric-Arellano left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Eric-Arellano commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

frankharkins left a comment •

edited

Loading

Eric-Arellano commented Aug 27, 2025 •

edited

Loading