feat: generate page description with llm script #2368

arielweinberger · 2025-09-04T18:45:08Z

How to use this script

First, install dependencies:

pnpm i

Ensure you have an OPENAI_API_KEY set.

Then:

pnpm run generate:page-description --  --file-path ./src/routes/docs/quick-starts/react/+page.markdoc

Output:

> tsx ./scripts/llm-generate-description.ts -- --file-path ./src/routes/docs/quick-starts/react/+page.markdoc

Generating description...
Description generated successfully (30 characters)
Generated description (30 characters):

Start building React apps with Appwrite: set up a Vite project, install the Appwrite JS SDK, configure your project, and add email/password authentication (register, login, logout) using Appwrite's Account API.

Summary by CodeRabbit

New Features
- Added a CLI tool to generate SEO-friendly page descriptions for documentation pages.
Refactor
- Centralized the documentation/Markdoc schema configuration used by the preprocessing pipeline.
Chores
- Added a package script to run the generation workflow and new development dependencies to support it.

appwrite · 2025-09-04T18:45:12Z

appwrite.io

Project ID: 684969cb000a2f6c0a02

Sites (1)

Site	Status	Logs	Preview	QR
website `68496a17000f03d62013`	Building	View Logs	Preview URL

Note

Cursor pagination performs better than offset pagination when loading further pages.

coderabbitai · 2025-09-04T18:45:15Z

Walkthrough

Adds a CLI to generate SEO-focused page descriptions: new script generate:page-description runs scripts/llm-generate-description.ts, which reads a Markdoc file, parses frontmatter via front-matter, and returns articleText plus frontmatter attributes. The module exports getDocPageContent and generateDescriptionForDocsPage(filePath, { skipIfExists }). It calls OpenAI (gpt-4o-mini) to produce ≤250-char descriptions with a retry if too long, supports --file-path, --skip-existing, and --help, and prints/logs results. svelte.config.js now exports markdocSchema and uses it in preprocessing. DevDependencies for parsing, Markdoc, OpenAI, and CLI runtime were added.

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly and accurately describes the primary change: adding an LLM-backed script to generate page descriptions. It follows conventional commit style ("feat:") and is concise without extraneous details. The title directly reflects the changes made to package.json and scripts/llm-generate-description.ts.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/generate-llm-page-description-script

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ItzNotABug · 2025-09-05T05:53:27Z

scripts/llm-generate-description.ts

+    previousAttempt
+}: {
+    articleText: string;
+    frontmatterAttributes: Record<string, any>;


can we avoid any if possible? Our fm vals are mostly string, date or integers at max.

scripts/llm-generate-description.ts

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (3)

svelte.config.js (1)

8-25: Good centralization; consider moving schema to a standalone module to avoid loading SvelteKit config at runtime.

Importing svelte.config.js from scripts will evaluate adapters and other config. Extracting markdocSchema to e.g. src/markdoc/schema.js reduces coupling and speeds CLIs.

Example (new file):

// src/markdoc/schema.js
import { dirname, join } from 'path';
import { fileURLToPath } from 'url';

const absolute = (p) => join(dirname(fileURLToPath(import.meta.url)), '..', '..', p);

export const markdocSchema = {
  generateSchema: true,
  nodes: absolute('src/markdoc/nodes/_Module.svelte'),
  tags: absolute('src/markdoc/tags/_Module.svelte'),
  partials: absolute('src/partials'),
  layouts: {
    default: absolute('src/markdoc/layouts/Article.svelte'),
    article: absolute('src/markdoc/layouts/Article.svelte'),
    tutorial: absolute('src/markdoc/layouts/Tutorial.svelte'),
    post: absolute('src/markdoc/layouts/Post.svelte'),
    partner: absolute('src/markdoc/layouts/Partner.svelte'),
    author: absolute('src/markdoc/layouts/Author.svelte'),
    category: absolute('src/markdoc/layouts/Category.svelte'),
    policy: absolute('src/markdoc/layouts/Policy.svelte'),
    changelog: absolute('src/markdoc/layouts/Changelog.svelte'),
    integration: absolute('src/markdoc/layouts/Integration.svelte')
  }
};

Then:

In svelte.config.js: import { markdocSchema } from './src/markdoc/schema.js'
In the script: import { markdocSchema } from '../src/markdoc/schema.js'

scripts/llm-generate-description.ts (2)

139-149: Include character count in the final output to match logs/PR summary.

Apply:
-    console.log(`Generated description:\n\n${description}\n`);
+    console.log(`Generated description (${characterCount} characters):\n\n${description}\n`);
31-77: Optional: tighten the prompt for single-line meta output.

Ask for a single line, no surrounding quotes, and avoid trailing period if already at limit.

Apply:
           - Output must be worthy of being used as a meta description.
+          - Output must be a single line with no surrounding quotes.
+          - Prefer <= ${MAX_CHARS} characters; do not exceed. If needed, omit trailing punctuation to fit.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3da86c9 and 0ac69b2.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (3)

package.json (2 hunks)
scripts/llm-generate-description.ts (1 hunks)
svelte.config.js (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: tests

🔇 Additional comments (4)

svelte.config.js (1)

33-33: LGTM: reusing the centralized schema in preprocess.

package.json (2)

16-16: Script entry is correct; note CLI will currently no-op due to a direct-execution check bug in the script.

See suggested fix in scripts/llm-generate-description.ts to make pnpm run generate:page-description actually invoke main().

67-73: AI/LLM deps added as devDependencies — confirm install context.

If this CLI is ever run in a prod/CI environment with --prod, these won’t be installed. If that’s intended, fine; otherwise promote to dependencies.

scripts/llm-generate-description.ts (1)

31-33: Model ID validated
“gpt-5-mini” is supported by @ai-sdk/openai and is the recommended compact option for short SEO copy.

coderabbitai · 2025-09-05T05:57:54Z

scripts/llm-generate-description.ts

+if (!process.env.OPENAI_API_KEY) {
+    throw new Error('OPENAI_API_KEY is not set');
+}


⚠️ Potential issue

Do not throw at import-time if OPENAI_API_KEY is missing.

Top-level throw breaks importing the module (tests, tooling). Validate inside main() (or right before the LLM call) instead.

Apply:

-if (!process.env.OPENAI_API_KEY) { - throw new Error('OPENAI_API_KEY is not set'); -}

And add after the file-path check in main():

if (!filePathArg) { throw new Error('File path is required'); } + if (!process.env.OPENAI_API_KEY) { + throw new Error('OPENAI_API_KEY is required'); + }

🤖 Prompt for AI Agents

In scripts/llm-generate-description.ts around lines 14-16, do not throw at module import when OPENAI_API_KEY is missing; move the environment validation into main() (or right before the LLM call) so importing the module (tests/tooling) won't crash. Remove the top-level throw and instead add a check after the file-path check in main(): if OPENAI_API_KEY is missing, log a clear error and throw or exit there before any LLM calls so the module loads safely but runtime fails fast when actually executed.

coderabbitai · 2025-09-05T05:57:55Z

scripts/llm-generate-description.ts

+async function generateDescription({
+    articleText,
+    frontmatterAttributes,
+    previousAttempt
+}: {
+    articleText: string;
+    frontmatterAttributes: Record<string, any>;
+    previousAttempt?: {
+        text: string;
+        characterCount: number;
+    };
+}) {


🛠️ Refactor suggestion

Unbounded recursion risk; cap attempts.

Add an attempt counter with a sane max; fall back by truncating on last attempt.

Apply:

-async function generateDescription({ +const MAX_CHARS = 250; +const MAX_ATTEMPTS = 3; + +async function generateDescription({ articleText, frontmatterAttributes, - previousAttempt + previousAttempt, + attempt = 1 }: { articleText: string; frontmatterAttributes: Record<string, any>; previousAttempt?: { text: string; characterCount: number; - }; + }; + attempt?: number; }) {

And update the retry block:

- if (characterCount > 250) { - console.log(`Character count is too long (${characterCount}), generating again...`); - return generateDescription({ + if (characterCount > MAX_CHARS) { + if (attempt >= MAX_ATTEMPTS) { + const clipped = normalized.slice(0, MAX_CHARS); + return { description: clipped, characterCount: MAX_CHARS }; + } + console.log(`Over limit (${characterCount} chars). Regenerating (attempt ${attempt + 1}/${MAX_ATTEMPTS})...`); + return generateDescription({ articleText, frontmatterAttributes, previousAttempt: { - text, - characterCount - } + text: normalized, + characterCount + }, + attempt: attempt + 1 }); }

🤖 Prompt for AI Agents

In scripts/llm-generate-description.ts around lines 18-29, the generateDescription function lacks a bounded retry/recursion mechanism; add an attempt counter parameter (e.g., attempt = 0) and a MAX_ATTEMPTS constant (e.g., 3-5). On each retry increment attempt and, if attempt >= MAX_ATTEMPTS, stop recursing and fall back by truncating articleText to a safe length (or use previousAttempt.text truncated) before making the final call; otherwise retry normally. Update the retry block to pass attempt + 1 on recursive calls and to perform the truncation fallback when the max is reached. Ensure types/signature reflect the new optional attempt parameter.

scripts/llm-generate-description.ts

coderabbitai · 2025-09-05T05:57:55Z

scripts/llm-generate-description.ts

+    // Use JSDOM to parse the HTML and extract text content from <article>
+    const dom = new JSDOM(html);
+    const articleElement = dom.window.document.querySelector('article');
+    const articleText = articleElement ? articleElement.textContent : '';
+    return {
+        articleText,
+        frontmatterAttributes: frontmatter.attributes
+    };


🛠️ Refactor suggestion

Make article extraction resilient and normalized.

Fallback to main/body, and collapse whitespace.

Apply:

- const dom = new JSDOM(html); - const articleElement = dom.window.document.querySelector('article'); - const articleText = articleElement ? articleElement.textContent : ''; + const dom = new JSDOM(html); + const articleElement = + dom.window.document.querySelector('article, main') ?? dom.window.document.body; + const articleText = (articleElement?.textContent || '').replace(/\s+/g, ' ').trim();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Use JSDOM to parse the HTML and extract text content from <article>

const dom = new JSDOM(html);

const articleElement = dom.window.document.querySelector('article');

const articleText = articleElement ? articleElement.textContent : '';

return {

articleText,

frontmatterAttributes: frontmatter.attributes

};

// Use JSDOM to parse the HTML and extract text content from <article> or <main>

const dom = new JSDOM(html);

const articleElement =

dom.window.document.querySelector('article, main') ?? dom.window.document.body;

const articleText = (articleElement?.textContent || '').replace(/\s+/g, ' ').trim();

return {

articleText,

frontmatterAttributes: frontmatter.attributes

};

🤖 Prompt for AI Agents

In scripts/llm-generate-description.ts around lines 114 to 121, the article extraction should be more resilient and normalized: update the DOM query to fallback to <main> then <body> when <article> is missing, extract textContent from the first available element, collapse consecutive whitespace into single spaces and trim leading/trailing whitespace, and return that normalized string as articleText along with frontmatter.attributes.

coderabbitai · 2025-09-05T05:57:55Z

scripts/llm-generate-description.ts

+export async function generateDescriptionForDocsPage(filePath: string) {
+    const resolvedPath = path.resolve(filePath);
+    const { articleText, frontmatterAttributes } = await getDocPageContent(resolvedPath);
+
+    if (!articleText || !frontmatterAttributes) {
+        throw new Error('Article text or frontmatter attributes are undefined');
+    }
+
+    const { description, characterCount } = await generateDescription({
+        articleText,
+        frontmatterAttributes
+    });
+    return { description, characterCount };
+}


🛠️ Refactor suggestion

Gracefully handle empty article text; use frontmatter as fallback.

Avoid hard failure when the page has little/no static text.

Apply:

- const { articleText, frontmatterAttributes } = await getDocPageContent(resolvedPath); - - if (!articleText || !frontmatterAttributes) { - throw new Error('Article text or frontmatter attributes are undefined'); - } - - const { description, characterCount } = await generateDescription({ - articleText, - frontmatterAttributes - }); + const { articleText, frontmatterAttributes } = await getDocPageContent(resolvedPath); + const pageContent = + articleText || + [frontmatterAttributes?.title, frontmatterAttributes?.description, frontmatterAttributes?.summary] + .filter(Boolean) + .join(': '); + if (!pageContent) { + throw new Error('No content available to generate a description'); + } + const { description, characterCount } = await generateDescription({ + articleText: pageContent, + frontmatterAttributes + });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export async function generateDescriptionForDocsPage(filePath: string) {

const resolvedPath = path.resolve(filePath);

const { articleText, frontmatterAttributes } = await getDocPageContent(resolvedPath);

if (!articleText || !frontmatterAttributes) {

throw new Error('Article text or frontmatter attributes are undefined');

}

const { description, characterCount } = await generateDescription({

articleText,

frontmatterAttributes

});

return { description, characterCount };

}

export async function generateDescriptionForDocsPage(filePath: string) {

const resolvedPath = path.resolve(filePath);

const { articleText, frontmatterAttributes } = await getDocPageContent(resolvedPath);

const pageContent =

articleText ||

[frontmatterAttributes?.title, frontmatterAttributes?.description, frontmatterAttributes?.summary]

.filter(Boolean)

.join(': ');

if (!pageContent) {

throw new Error('No content available to generate a description');

}

const { description, characterCount } = await generateDescription({

articleText: pageContent,

frontmatterAttributes

});

return { description, characterCount };

}

🤖 Prompt for AI Agents

In scripts/llm-generate-description.ts around lines 124 to 137, the function currently throws if articleText or frontmatterAttributes are missing; change it to gracefully handle empty article text by using frontmatter as a fallback: check if articleText is falsy or trim().length === 0, and if so construct a fallback source (prefer frontmatterAttributes.description, else frontmatterAttributes.title and any summary fields joined) and pass that into generateDescription; only throw if frontmatterAttributes is also entirely missing and no fallback can be constructed, then return the resulting description and characterCount as before.

coderabbitai · 2025-09-05T05:57:55Z

scripts/llm-generate-description.ts

+// Runs only if invoked via CLI
+// @ts-ignore
+const isDirect = import.meta.url === pathToFileURL(process.argv[1]).href;
+if (isDirect) {
+    main().catch((err) => {
+        console.error(err);
+        process.exit(1);
+    });
+}


⚠️ Potential issue

Fix direct-execution check for tsx.

With pnpm run generate:page-description (tsx), process.argv[1] is the tsx binary, so main() never runs.

Apply:

-const isDirect = import.meta.url === pathToFileURL(process.argv[1]).href; +const scriptPath = process.argv[1]?.endsWith('tsx') ? process.argv[2] : process.argv[1]; +const isDirect = + !!scriptPath && import.meta.url === pathToFileURL(path.resolve(scriptPath)).href;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Runs only if invoked via CLI

// @ts-ignore

const isDirect = import.meta.url === pathToFileURL(process.argv[1]).href;

if (isDirect) {

main().catch((err) => {

console.error(err);

process.exit(1);

});

}

// Runs only if invoked via CLI

// @ts-ignore

const scriptPath = process.argv[1]?.endsWith('tsx') ? process.argv[2] : process.argv[1];

const isDirect =

!!scriptPath && import.meta.url === pathToFileURL(path.resolve(scriptPath)).href;

if (isDirect) {

main().catch((err) => {

console.error(err);

process.exit(1);

});

}

🤖 Prompt for AI Agents

In scripts/llm-generate-description.ts around lines 151–159, the direct-execution check fails when run via tsx because process.argv[1] points to the tsx binary rather than the script; update the check to locate the actual script arg (search process.argv for the argument whose basename matches this script like llm-generate-description.ts or .tsx), convert that found arg to a file:// URL with pathToFileURL and compare it to import.meta.url; if no matching arg is found fall back to the existing check so main() runs correctly when invoked directly by node or tsx.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

scripts/llm-generate-description.ts (2)

187-194: Fail fast with a clear error if OPENAI_API_KEY is missing.

Surface a helpful message before making the API call.

   if (!filePathArg) {
     throw new Error("File path is required. Use --help for usage information.");
   }
 
+  if (!process.env.OPENAI_API_KEY) {
+    throw new Error("OPENAI_API_KEY is required to generate a page description");
+  }
+
   const resolvedPath = path.resolve(filePathArg);
   const { description, characterCount, skipped } =
     await generateDescriptionForDocsPage(resolvedPath, { skipIfExists });

209-216: Fix CLI direct-execution check to work with pnpm/tsx/ts-node.

Current endsWith check can miss when invoked via tsx, so main() may not run.

-const isDirect =
-  process.argv[1] && process.argv[1].endsWith("llm-generate-description.ts");
-if (isDirect) {
+const argv = process.argv.slice(1);
+const candidate = argv.find((a) =>
+  /llm-generate-description\.(t|j)sx?$/i.test(a)
+);
+const scriptPath = candidate ? path.resolve(candidate) : undefined;
+const isDirect =
+  !!scriptPath && import.meta.url === pathToFileURL(scriptPath).href;
+if (isDirect) {
   main().catch((err) => {
     console.error(err);
     process.exit(1);
   });
 }

🧹 Nitpick comments (1)

scripts/llm-generate-description.ts (1)

191-205: Optional: Add small progress logs to match the PR’s usage examples.

Improves UX parity (“Generating…”, “Generated successfully (N characters)”).

-  const resolvedPath = path.resolve(filePathArg);
-  const { description, characterCount, skipped } =
+  const resolvedPath = path.resolve(filePathArg);
+  console.log("Generating description...");
+  const { description, characterCount, skipped } =
     await generateDescriptionForDocsPage(resolvedPath, { skipIfExists });
 
   if (skipped) {
     console.log("✅ File skipped - description already exists");
     return;
   }
 
+  console.log(`Description generated successfully (${characterCount} characters)`);
   console.log(
     `================ DESCRIPTION START (character count: ${characterCount}) =================`,
   );

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ac69b2 and ea9b789.

📒 Files selected for processing (1)

scripts/llm-generate-description.ts (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: tests
GitHub Check: build

🔇 Additional comments (1)

scripts/llm-generate-description.ts (1)

98-117: LGTM: simple, robust Markdoc frontmatter + raw body extraction.

Using raw Markdoc body avoids JSDOM/renderer complexity and is fine for LLM context. Error wrapping is clear.

coderabbitai · 2025-09-11T23:46:21Z

scripts/llm-generate-description.ts

+async function generateDescription({
+  articleText,
+  frontmatterAttributes,
+}: {
+  articleText: string;
+  frontmatterAttributes: Record<
+    string,
+    string | number | boolean | Date | string[] | number[] | boolean[]
+  >;
+}) {
+  const systemPrompt = `You are an expert at writing SEO-optimized page descriptions for technical documentation websites targeting senior software engineers.
+
+Generate a concise, professional description (maximum 250 characters) that:
+- Accurately summarizes the technical content
+- Uses natural language with standard punctuation (use regular hyphens, not em dashes)
+- Speaks directly to experienced developers and engineering leaders
+- Includes relevant technical keywords for SEO
+- Avoids AI-generated language patterns or marketing fluff
+- Uses a professional, authoritative tone that resonates with senior engineers
+
+The description should be suitable for use in HTML meta descriptions and social media previews.`;
+
+  const userPrompt = `Generate a page description for this documentation page:
+
+Title: ${frontmatterAttributes?.title || "Untitled"}
+Summary: ${frontmatterAttributes?.summary || "No summary provided"}
+
+Content:
+${articleText}
+
+Generate a description that captures the essence of this page in 250 characters or less.`;
+


🛠️ Refactor suggestion

Enforce true ≤250-character limit: normalize, count code points, and clamp after retry.

Prevents Unicode miscounts and returning >250 chars even after retry.

async function generateDescription({ articleText, frontmatterAttributes, }: { articleText: string; frontmatterAttributes: Record< string, string | number | boolean | Date | string[] | number[] | boolean[] >; }) { - const systemPrompt = `You are an expert at writing SEO-optimized page descriptions for technical documentation websites targeting senior software engineers. + const MAX_CHARS = 250; + const systemPrompt = `You are an expert at writing SEO-optimized page descriptions for technical documentation websites targeting senior software engineers. -Generate a concise, professional description (maximum 250 characters) that: +Generate a concise, professional description (maximum ${MAX_CHARS} characters) that: - Accurately summarizes the technical content - Uses natural language with standard punctuation (use regular hyphens, not em dashes) - Speaks directly to experienced developers and engineering leaders - Includes relevant technical keywords for SEO - Avoids AI-generated language patterns or marketing fluff - Uses a professional, authoritative tone that resonates with senior engineers The description should be suitable for use in HTML meta descriptions and social media previews.`; const userPrompt = `Generate a page description for this documentation page: Title: ${frontmatterAttributes?.title || "Untitled"} Summary: ${frontmatterAttributes?.summary || "No summary provided"} Content: ${articleText} -Generate a description that captures the essence of this page in 250 characters or less.`; +Generate a description that captures the essence of this page in ${MAX_CHARS} characters or less.`; try { const { text: description } = await generateText({ model: openai("gpt-4o-mini"), system: systemPrompt, prompt: userPrompt, maxTokens: 100, }); - const trimmedDescription = description.trim(); - const characterCount = trimmedDescription.length; + const normalized = description.replace(/\s+/g, " ").trim(); + const characterCount = Array.from(normalized).length; // If the description is too long, try again with a more specific prompt - if (characterCount > 250) { - const retryPrompt = `The previous description was too long (${characterCount} characters). Generate a shorter description (maximum 250 characters) for this page: + if (characterCount > MAX_CHARS) { + const retryPrompt = `The previous description was too long (${characterCount} characters). Generate a shorter description (maximum ${MAX_CHARS} characters) for this page: Title: ${frontmatterAttributes?.title || "Untitled"} Content: ${articleText.substring(0, 500)}... -Make it concise and under 250 characters.`; +Make it concise and under ${MAX_CHARS} characters.`; const { text: retryDescription } = await generateText({ model: openai("gpt-4o-mini"), system: systemPrompt, prompt: retryPrompt, maxTokens: 80, }); - const finalDescription = retryDescription.trim(); - return { - description: finalDescription, - characterCount: finalDescription.length, - }; + let final = retryDescription.replace(/\s+/g, " ").trim(); + let finalCount = Array.from(final).length; + if (finalCount > MAX_CHARS) { + final = Array.from(final).slice(0, MAX_CHARS).join(""); + finalCount = MAX_CHARS; + } + return { description: final, characterCount: finalCount }; } - return { description: trimmedDescription, characterCount }; + return { description: normalized, characterCount }; } catch (error) { throw new Error( `Failed to generate description: ${error instanceof Error ? error.message : "Unknown error"}`, ); } }

Also applies to: 56-91

feat: generate page description with llm script

a4900d4

adityaoberai requested review from ItzNotABug and TorstenDittmann September 4, 2025 18:46

arielweinberger and others added 6 commits September 4, 2025 13:47

feat: expose reusable function

36c5475

do not run main() if imported

ba95757

better looking output

b0cd081

fix output

37da5ed

fix: formatting

7204134

Merge branch 'main' into feat/generate-llm-page-description-script

0ac69b2

ItzNotABug requested changes Sep 5, 2025

View reviewed changes

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

improvements to the llm generator

ea9b789

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

tessamero mentioned this pull request Sep 12, 2025

feat: generate page description with llm script (improvements from original PR) #2389

Open

feat: generate page description with llm script #2368

Are you sure you want to change the base?

feat: generate page description with llm script #2368

Uh oh!

Conversation

arielweinberger commented Sep 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use this script

Summary by CodeRabbit

Uh oh!

appwrite bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

appwrite.io

Uh oh!

coderabbitai bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Pre-merge checks (2 passed, 1 warning)

Uh oh!

ItzNotABug Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arielweinberger commented Sep 4, 2025 •

edited by coderabbitai bot

Loading

appwrite bot commented Sep 4, 2025 •

edited

Loading

coderabbitai bot commented Sep 4, 2025 •

edited

Loading