-
Notifications
You must be signed in to change notification settings - Fork 275
feat: generate page description with llm script (improvements from original PR) #2389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a CLI and programmatic tool to generate ~250-character page descriptions for Markdoc docs using an LLM. Introduces scripts/llm-generate-description.ts with two exports (getDocPageContent, generateDescriptionForDocsPage), a main entrypoint handling args (file path, skip existing), prompt construction, retries if over limit, and structured output. Updates package.json with a new script (generate:page-description) and dependencies (dedent, @ai-sdk/openai, @markdoc/markdoc, ai, front-matter, jsdom, tsx). Refactors svelte.config.js to export a reusable markdocSchema and uses it in preprocess. Pre-merge checks (4 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. ✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
scripts/llm-generate-description.ts (8)
1-7
: Use node: specifiers and keep pathToFileURL for robust direct-run detection.Minor import nit and sets up for a better isDirect check below.
Apply:
-import { readFileSync } from "fs"; -import { pathToFileURL } from "url"; -import path from "path"; +import { readFileSync } from "node:fs"; +import { pathToFileURL } from "node:url"; +import path from "node:path"; + +type FrontmatterValue = + | string + | number + | boolean + | Date + | string[] + | number[] + | boolean[] + | Record<string, unknown> + | null; +export type FrontmatterAttributes = Record<string, FrontmatterValue>;
24-33
: Type the frontmatter more explicitly.Use a shared FrontmatterAttributes alias instead of an inline Record union.
-}: { - articleText: string; - frontmatterAttributes: Record< - string, - string | number | boolean | Date | string[] | number[] | boolean[] - >; -}) { +}: { + articleText: string; + frontmatterAttributes: FrontmatterAttributes; +}) {
56-63
: Stabilize generations.Add a small temperature for consistency.
const { text: description } = await generateText({ model: openai("gpt-4o-mini"), system: systemPrompt, prompt: userPrompt, maxTokens: 100, + temperature: 0.2, });
98-117
: Add explicit return type for the exported API.Improves DX and catches accidental shape changes.
-export async function getDocPageContent(markdocPath: string) { +export async function getDocPageContent( + markdocPath: string, +): Promise<{ articleText: string; frontmatterAttributes: FrontmatterAttributes }> {
119-123
: Type the public function return.Small DX improvement.
-export async function generateDescriptionForDocsPage( +export async function generateDescriptionForDocsPage( filePath: string, - options: { skipIfExists?: boolean } = {}, -) { + options: { skipIfExists?: boolean } = {}, +): Promise<{ description: string; characterCount: number; skipped?: boolean }> {
127-131
: Redundant null check.front-matter always returns an attributes object; this branch is unreachable.
- if (!frontmatterAttributes) { - throw new Error( - "Frontmatter attributes are undefined - file may be malformed", - ); - }
209-216
: More reliable direct-run detection in ESM/tsx.Compare import.meta.url to argv[1] as a file URL.
-const isDirect = - process.argv[1] && process.argv[1].endsWith("llm-generate-description.ts"); +const argv1 = process.argv[1]; +const isDirect = !!argv1 && import.meta.url === pathToFileURL(argv1).href;
169-184
: Use pnpm in help textReplace npm with pnpm in the Usage / Examples / CI lines in scripts/llm-generate-description.ts.
Occurrences: scripts/llm-generate-description.ts — lines 170, 178-179, 182
-Usage: - npm run generate:page-description -- --file-path <path> [options] +Usage: + pnpm run generate:page-description -- --file-path <path> [options] @@ -Examples: - npm run generate:page-description -- --file-path ./blog-post.markdoc - npm run generate:page-description -- --file-path ./blog-post.markdoc --skip-existing +Examples: + pnpm run generate:page-description -- --file-path ./blog-post.markdoc + pnpm run generate:page-description -- --file-path ./blog-post.markdoc --skip-existing @@ -For CI/CD usage: - npm run generate:page-description -- --file-path ./new-post.markdoc --skip-existing +For CI/CD usage: + pnpm run generate:page-description -- --file-path ./new-post.markdoc --skip-existing
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (3)
package.json
(2 hunks)scripts/llm-generate-description.ts
(1 hunks)svelte.config.js
(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Tests
svelte.config.js
[warning] 1-1: Prettier formatting issue detected. Run 'prettier --write' to fix.
scripts/llm-generate-description.ts
[warning] 1-1: Prettier formatting issue detected. Run 'prettier --write' to fix.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (6)
package.json (2)
16-16
: Script entrypoint LGTM.CLI wired via tsx is appropriate and consistent with the ESM setup.
67-73
: New devDependencies look correct for the CLI.All additions belong in devDependencies for a tooling-only workflow.
svelte.config.js (3)
8-26
: Centralized Markdoc schema export looks good.This improves reuse and keeps preprocess config single-sourced.
33-33
: Good switch to use the shared schema in preprocess.This prevents config drift between CLI and app.
8-26
: Fix Prettier warning in CI.Run: pnpm format (or prettier --write .) to resolve the reported formatting issue.
scripts/llm-generate-description.ts (1)
1-1
: Fix Prettier warning in CI.Run: pnpm format (or prettier --write .) to resolve the reported formatting issue.
const trimmedDescription = description.trim(); | ||
const characterCount = trimmedDescription.length; | ||
|
||
// If the description is too long, try again with a more specific prompt | ||
if (characterCount > 250) { | ||
const retryPrompt = `The previous description was too long (${characterCount} characters). Generate a shorter description (maximum 250 characters) for this page: | ||
|
||
Title: ${frontmatterAttributes?.title || "Untitled"} | ||
Content: ${articleText.substring(0, 500)}... | ||
|
||
Make it concise and under 250 characters.`; | ||
|
||
const { text: retryDescription } = await generateText({ | ||
model: openai("gpt-4o-mini"), | ||
system: systemPrompt, | ||
prompt: retryPrompt, | ||
maxTokens: 80, | ||
}); | ||
|
||
const finalDescription = retryDescription.trim(); | ||
return { | ||
description: finalDescription, | ||
characterCount: finalDescription.length, | ||
}; | ||
} | ||
|
||
return { description: trimmedDescription, characterCount }; | ||
} catch (error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enforce ≤250 chars and sanitize output (single line, ASCII hyphens, quotes).
Guarantees meta-safe text even if the model overruns.
- const trimmedDescription = description.trim();
- const characterCount = trimmedDescription.length;
+ const sanitize = (s: string) =>
+ s.replace(/\s+/g, " ").replace(/[–—]/g, "-").replace(/"/g, "'").trim();
+ const trimmedDescription = sanitize(description);
+ let characterCount = trimmedDescription.length;
@@
- const { text: retryDescription } = await generateText({
+ const { text: retryDescription } = await generateText({
model: openai("gpt-4o-mini"),
system: systemPrompt,
prompt: retryPrompt,
- maxTokens: 80,
+ maxTokens: 80,
+ temperature: 0.2,
});
- const finalDescription = retryDescription.trim();
+ const retrimmed = sanitize(retryDescription);
+ const finalDescription =
+ retrimmed.length > 250
+ ? retrimmed.slice(0, 247).replace(/\s+\S*$/, "") + "…"
+ : retrimmed;
return {
description: finalDescription,
characterCount: finalDescription.length,
};
}
- return { description: trimmedDescription, characterCount };
+ const final =
+ trimmedDescription.length > 250
+ ? trimmedDescription.slice(0, 247).replace(/\s+\S*$/, "") + "…"
+ : trimmedDescription;
+ return { description: final, characterCount: final.length };
Also applies to: 76-88
🤖 Prompt for AI Agents
In scripts/llm-generate-description.ts around lines 64-91, the generated
description paths (both initial and retry) need to enforce a hard ≤250-character
limit and sanitize the string into a single-line, meta-safe form: after
trimming, replace newlines with a single space and collapse multiple spaces,
normalize smart quotes to straight ASCII quotes, replace en/em-dashes with ASCII
hyphen, optionally remove or normalize other problematic unicode if present,
then truncate to 250 characters and re-trim; compute characterCount from this
sanitized/truncated string and return that value. Apply the same
sanitization/truncation logic to the retryDescription path (lines ~76-88) so
both branches return a single-line ASCII-safe description no longer than 250
chars.
What does this PR do?
This PR implements an LLM-powered description generator for blog posts, building on the initial work from PR #2368 by @arielweinberger.
Key Improvements
any
types with proper TypeScript interfaces--skip-existing
flag for automated workflows (for future use so it’s not missed)Usage
Test Plan
Manual Testing Steps:
Install Dependencies:
Test Script with Help:
Expected: Shows usage instructions and available options
Test Script with Sample File:
npm run generate:page-description -- --file-path "./src/routes/changelog/(entries)/2025-07-10.markdoc"
Expected: Generates SEO description and displays it in terminal
Test Skip Existing Flag:
npm run generate:page-description -- --file-path "./src/routes/changelog/(entries)/2025-07-10.markdoc" --skip-existing
Expected: Skips file if description already exists, or generates description if none exists
Test Error Handling:
npm run generate:page-description -- --file-path "./nonexistent-file.markdoc"
Expected: Shows appropriate error message
Verification:
Note: Requires valid
OPENAI_API_KEY
environment variable for full functionality.Related PRs and Issues
Credits
Closes #2368
Have you read the Contributing Guidelines on issues?
yes
Summary by CodeRabbit
New Features
Refactor
Chores