Skip to content

Conversation

arielweinberger
Copy link

@arielweinberger arielweinberger commented Sep 4, 2025

How to use this script

First, install dependencies:

pnpm i

Ensure you have an OPENAI_API_KEY set.

Then:

pnpm run generate:page-description --  --file-path ./src/routes/docs/quick-starts/react/+page.markdoc 

Output:

> tsx ./scripts/llm-generate-description.ts -- --file-path ./src/routes/docs/quick-starts/react/+page.markdoc

Generating description...
Description generated successfully (30 characters)
Generated description (30 characters):

Start building React apps with Appwrite: set up a Vite project, install the Appwrite JS SDK, configure your project, and add email/password authentication (register, login, logout) using Appwrite's Account API.

Summary by CodeRabbit

  • New Features
    • Added a CLI tool to generate SEO-friendly page descriptions for documentation pages.
  • Refactor
    • Centralized the documentation/Markdoc schema configuration used by the preprocessing pipeline.
  • Chores
    • Added a package script to run the generation workflow and new development dependencies to support it.

Copy link

appwrite bot commented Sep 4, 2025

appwrite.io

Project ID: 684969cb000a2f6c0a02

Sites (1)
Site Status Logs Preview QR
 website
68496a17000f03d62013
Building Building View Logs Preview URL QR Code

Note

Cursor pagination performs better than offset pagination when loading further pages.

Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Walkthrough

Adds a CLI to generate SEO-focused page descriptions: new script generate:page-description runs scripts/llm-generate-description.ts, which reads a Markdoc file, parses frontmatter via front-matter, and returns articleText plus frontmatter attributes. The module exports getDocPageContent and generateDescriptionForDocsPage(filePath, { skipIfExists }). It calls OpenAI (gpt-4o-mini) to produce ≤250-char descriptions with a retry if too long, supports --file-path, --skip-existing, and --help, and prints/logs results. svelte.config.js now exports markdocSchema and uses it in preprocessing. DevDependencies for parsing, Markdoc, OpenAI, and CLI runtime were added.

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately describes the primary change: adding an LLM-backed script to generate page descriptions. It follows conventional commit style ("feat:") and is concise without extraneous details. The title directly reflects the changes made to package.json and scripts/llm-generate-description.ts.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/generate-llm-page-description-script

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

previousAttempt
}: {
articleText: string;
frontmatterAttributes: Record<string, any>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid any if possible? Our fm vals are mostly string, date or integers at max.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
svelte.config.js (1)

8-25: Good centralization; consider moving schema to a standalone module to avoid loading SvelteKit config at runtime.

Importing svelte.config.js from scripts will evaluate adapters and other config. Extracting markdocSchema to e.g. src/markdoc/schema.js reduces coupling and speeds CLIs.

Example (new file):

// src/markdoc/schema.js
import { dirname, join } from 'path';
import { fileURLToPath } from 'url';

const absolute = (p) => join(dirname(fileURLToPath(import.meta.url)), '..', '..', p);

export const markdocSchema = {
  generateSchema: true,
  nodes: absolute('src/markdoc/nodes/_Module.svelte'),
  tags: absolute('src/markdoc/tags/_Module.svelte'),
  partials: absolute('src/partials'),
  layouts: {
    default: absolute('src/markdoc/layouts/Article.svelte'),
    article: absolute('src/markdoc/layouts/Article.svelte'),
    tutorial: absolute('src/markdoc/layouts/Tutorial.svelte'),
    post: absolute('src/markdoc/layouts/Post.svelte'),
    partner: absolute('src/markdoc/layouts/Partner.svelte'),
    author: absolute('src/markdoc/layouts/Author.svelte'),
    category: absolute('src/markdoc/layouts/Category.svelte'),
    policy: absolute('src/markdoc/layouts/Policy.svelte'),
    changelog: absolute('src/markdoc/layouts/Changelog.svelte'),
    integration: absolute('src/markdoc/layouts/Integration.svelte')
  }
};

Then:

  • In svelte.config.js: import { markdocSchema } from './src/markdoc/schema.js'
  • In the script: import { markdocSchema } from '../src/markdoc/schema.js'
scripts/llm-generate-description.ts (2)

139-149: Include character count in the final output to match logs/PR summary.

Apply:

-    console.log(`Generated description:\n\n${description}\n`);
+    console.log(`Generated description (${characterCount} characters):\n\n${description}\n`);

31-77: Optional: tighten the prompt for single-line meta output.

Ask for a single line, no surrounding quotes, and avoid trailing period if already at limit.

Apply:

           - Output must be worthy of being used as a meta description.
+          - Output must be a single line with no surrounding quotes.
+          - Prefer <= ${MAX_CHARS} characters; do not exceed. If needed, omit trailing punctuation to fit.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3da86c9 and 0ac69b2.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (3)
  • package.json (2 hunks)
  • scripts/llm-generate-description.ts (1 hunks)
  • svelte.config.js (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: tests
🔇 Additional comments (4)
svelte.config.js (1)

33-33: LGTM: reusing the centralized schema in preprocess.

package.json (2)

16-16: Script entry is correct; note CLI will currently no-op due to a direct-execution check bug in the script.

See suggested fix in scripts/llm-generate-description.ts to make pnpm run generate:page-description actually invoke main().


67-73: AI/LLM deps added as devDependencies — confirm install context.

If this CLI is ever run in a prod/CI environment with --prod, these won’t be installed. If that’s intended, fine; otherwise promote to dependencies.

scripts/llm-generate-description.ts (1)

31-33: Model ID validated
“gpt-5-mini” is supported by @ai-sdk/openai and is the recommended compact option for short SEO copy.

Comment on lines 14 to 16
if (!process.env.OPENAI_API_KEY) {
throw new Error('OPENAI_API_KEY is not set');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Do not throw at import-time if OPENAI_API_KEY is missing.

Top-level throw breaks importing the module (tests, tooling). Validate inside main() (or right before the LLM call) instead.

Apply:

-if (!process.env.OPENAI_API_KEY) {
-    throw new Error('OPENAI_API_KEY is not set');
-}

And add after the file-path check in main():

   if (!filePathArg) {
       throw new Error('File path is required');
   }
+  if (!process.env.OPENAI_API_KEY) {
+      throw new Error('OPENAI_API_KEY is required');
+  }
🤖 Prompt for AI Agents
In scripts/llm-generate-description.ts around lines 14-16, do not throw at
module import when OPENAI_API_KEY is missing; move the environment validation
into main() (or right before the LLM call) so importing the module
(tests/tooling) won't crash. Remove the top-level throw and instead add a check
after the file-path check in main(): if OPENAI_API_KEY is missing, log a clear
error and throw or exit there before any LLM calls so the module loads safely
but runtime fails fast when actually executed.

Comment on lines 18 to 29
async function generateDescription({
articleText,
frontmatterAttributes,
previousAttempt
}: {
articleText: string;
frontmatterAttributes: Record<string, any>;
previousAttempt?: {
text: string;
characterCount: number;
};
}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Unbounded recursion risk; cap attempts.

Add an attempt counter with a sane max; fall back by truncating on last attempt.

Apply:

-async function generateDescription({
+const MAX_CHARS = 250;
+const MAX_ATTEMPTS = 3;
+
+async function generateDescription({
     articleText,
     frontmatterAttributes,
-    previousAttempt
+    previousAttempt,
+    attempt = 1
 }: {
     articleText: string;
     frontmatterAttributes: Record<string, any>;
     previousAttempt?: {
         text: string;
         characterCount: number;
-    };
+    };
+    attempt?: number;
 }) {

And update the retry block:

-    if (characterCount > 250) {
-        console.log(`Character count is too long (${characterCount}), generating again...`);
-        return generateDescription({
+    if (characterCount > MAX_CHARS) {
+        if (attempt >= MAX_ATTEMPTS) {
+            const clipped = normalized.slice(0, MAX_CHARS);
+            return { description: clipped, characterCount: MAX_CHARS };
+        }
+        console.log(`Over limit (${characterCount} chars). Regenerating (attempt ${attempt + 1}/${MAX_ATTEMPTS})...`);
+        return generateDescription({
             articleText,
             frontmatterAttributes,
             previousAttempt: {
-                text,
-                characterCount
-            }
+                text: normalized,
+                characterCount
+            },
+            attempt: attempt + 1
         });
     }
🤖 Prompt for AI Agents
In scripts/llm-generate-description.ts around lines 18-29, the
generateDescription function lacks a bounded retry/recursion mechanism; add an
attempt counter parameter (e.g., attempt = 0) and a MAX_ATTEMPTS constant (e.g.,
3-5). On each retry increment attempt and, if attempt >= MAX_ATTEMPTS, stop
recursing and fall back by truncating articleText to a safe length (or use
previousAttempt.text truncated) before making the final call; otherwise retry
normally. Update the retry block to pass attempt + 1 on recursive calls and to
perform the truncation fallback when the max is reached. Ensure types/signature
reflect the new optional attempt parameter.

Comment on lines 114 to 121
// Use JSDOM to parse the HTML and extract text content from <article>
const dom = new JSDOM(html);
const articleElement = dom.window.document.querySelector('article');
const articleText = articleElement ? articleElement.textContent : '';
return {
articleText,
frontmatterAttributes: frontmatter.attributes
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make article extraction resilient and normalized.

Fallback to main/body, and collapse whitespace.

Apply:

-    const dom = new JSDOM(html);
-    const articleElement = dom.window.document.querySelector('article');
-    const articleText = articleElement ? articleElement.textContent : '';
+    const dom = new JSDOM(html);
+    const articleElement =
+        dom.window.document.querySelector('article, main') ?? dom.window.document.body;
+    const articleText = (articleElement?.textContent || '').replace(/\s+/g, ' ').trim();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Use JSDOM to parse the HTML and extract text content from <article>
const dom = new JSDOM(html);
const articleElement = dom.window.document.querySelector('article');
const articleText = articleElement ? articleElement.textContent : '';
return {
articleText,
frontmatterAttributes: frontmatter.attributes
};
// Use JSDOM to parse the HTML and extract text content from <article> or <main>
const dom = new JSDOM(html);
const articleElement =
dom.window.document.querySelector('article, main') ?? dom.window.document.body;
const articleText = (articleElement?.textContent || '').replace(/\s+/g, ' ').trim();
return {
articleText,
frontmatterAttributes: frontmatter.attributes
};
🤖 Prompt for AI Agents
In scripts/llm-generate-description.ts around lines 114 to 121, the article
extraction should be more resilient and normalized: update the DOM query to
fallback to <main> then <body> when <article> is missing, extract textContent
from the first available element, collapse consecutive whitespace into single
spaces and trim leading/trailing whitespace, and return that normalized string
as articleText along with frontmatter.attributes.

Comment on lines 124 to 137
export async function generateDescriptionForDocsPage(filePath: string) {
const resolvedPath = path.resolve(filePath);
const { articleText, frontmatterAttributes } = await getDocPageContent(resolvedPath);

if (!articleText || !frontmatterAttributes) {
throw new Error('Article text or frontmatter attributes are undefined');
}

const { description, characterCount } = await generateDescription({
articleText,
frontmatterAttributes
});
return { description, characterCount };
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Gracefully handle empty article text; use frontmatter as fallback.

Avoid hard failure when the page has little/no static text.

Apply:

-    const { articleText, frontmatterAttributes } = await getDocPageContent(resolvedPath);
-
-    if (!articleText || !frontmatterAttributes) {
-        throw new Error('Article text or frontmatter attributes are undefined');
-    }
-
-    const { description, characterCount } = await generateDescription({
-        articleText,
-        frontmatterAttributes
-    });
+    const { articleText, frontmatterAttributes } = await getDocPageContent(resolvedPath);
+    const pageContent =
+        articleText ||
+        [frontmatterAttributes?.title, frontmatterAttributes?.description, frontmatterAttributes?.summary]
+            .filter(Boolean)
+            .join(': ');
+    if (!pageContent) {
+        throw new Error('No content available to generate a description');
+    }
+    const { description, characterCount } = await generateDescription({
+        articleText: pageContent,
+        frontmatterAttributes
+    });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function generateDescriptionForDocsPage(filePath: string) {
const resolvedPath = path.resolve(filePath);
const { articleText, frontmatterAttributes } = await getDocPageContent(resolvedPath);
if (!articleText || !frontmatterAttributes) {
throw new Error('Article text or frontmatter attributes are undefined');
}
const { description, characterCount } = await generateDescription({
articleText,
frontmatterAttributes
});
return { description, characterCount };
}
export async function generateDescriptionForDocsPage(filePath: string) {
const resolvedPath = path.resolve(filePath);
const { articleText, frontmatterAttributes } = await getDocPageContent(resolvedPath);
const pageContent =
articleText ||
[frontmatterAttributes?.title, frontmatterAttributes?.description, frontmatterAttributes?.summary]
.filter(Boolean)
.join(': ');
if (!pageContent) {
throw new Error('No content available to generate a description');
}
const { description, characterCount } = await generateDescription({
articleText: pageContent,
frontmatterAttributes
});
return { description, characterCount };
}
🤖 Prompt for AI Agents
In scripts/llm-generate-description.ts around lines 124 to 137, the function
currently throws if articleText or frontmatterAttributes are missing; change it
to gracefully handle empty article text by using frontmatter as a fallback:
check if articleText is falsy or trim().length === 0, and if so construct a
fallback source (prefer frontmatterAttributes.description, else
frontmatterAttributes.title and any summary fields joined) and pass that into
generateDescription; only throw if frontmatterAttributes is also entirely
missing and no fallback can be constructed, then return the resulting
description and characterCount as before.

Comment on lines 151 to 159
// Runs only if invoked via CLI
// @ts-ignore
const isDirect = import.meta.url === pathToFileURL(process.argv[1]).href;
if (isDirect) {
main().catch((err) => {
console.error(err);
process.exit(1);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix direct-execution check for tsx.

With pnpm run generate:page-description (tsx), process.argv[1] is the tsx binary, so main() never runs.

Apply:

-const isDirect = import.meta.url === pathToFileURL(process.argv[1]).href;
+const scriptPath = process.argv[1]?.endsWith('tsx') ? process.argv[2] : process.argv[1];
+const isDirect =
+  !!scriptPath && import.meta.url === pathToFileURL(path.resolve(scriptPath)).href;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Runs only if invoked via CLI
// @ts-ignore
const isDirect = import.meta.url === pathToFileURL(process.argv[1]).href;
if (isDirect) {
main().catch((err) => {
console.error(err);
process.exit(1);
});
}
// Runs only if invoked via CLI
// @ts-ignore
const scriptPath = process.argv[1]?.endsWith('tsx') ? process.argv[2] : process.argv[1];
const isDirect =
!!scriptPath && import.meta.url === pathToFileURL(path.resolve(scriptPath)).href;
if (isDirect) {
main().catch((err) => {
console.error(err);
process.exit(1);
});
}
🤖 Prompt for AI Agents
In scripts/llm-generate-description.ts around lines 151–159, the
direct-execution check fails when run via tsx because process.argv[1] points to
the tsx binary rather than the script; update the check to locate the actual
script arg (search process.argv for the argument whose basename matches this
script like llm-generate-description.ts or .tsx), convert that found arg to a
file:// URL with pathToFileURL and compare it to import.meta.url; if no matching
arg is found fall back to the existing check so main() runs correctly when
invoked directly by node or tsx.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
scripts/llm-generate-description.ts (2)

187-194: Fail fast with a clear error if OPENAI_API_KEY is missing.

Surface a helpful message before making the API call.

   if (!filePathArg) {
     throw new Error("File path is required. Use --help for usage information.");
   }
 
+  if (!process.env.OPENAI_API_KEY) {
+    throw new Error("OPENAI_API_KEY is required to generate a page description");
+  }
+
   const resolvedPath = path.resolve(filePathArg);
   const { description, characterCount, skipped } =
     await generateDescriptionForDocsPage(resolvedPath, { skipIfExists });

209-216: Fix CLI direct-execution check to work with pnpm/tsx/ts-node.

Current endsWith check can miss when invoked via tsx, so main() may not run.

-const isDirect =
-  process.argv[1] && process.argv[1].endsWith("llm-generate-description.ts");
-if (isDirect) {
+const argv = process.argv.slice(1);
+const candidate = argv.find((a) =>
+  /llm-generate-description\.(t|j)sx?$/i.test(a)
+);
+const scriptPath = candidate ? path.resolve(candidate) : undefined;
+const isDirect =
+  !!scriptPath && import.meta.url === pathToFileURL(scriptPath).href;
+if (isDirect) {
   main().catch((err) => {
     console.error(err);
     process.exit(1);
   });
 }
🧹 Nitpick comments (1)
scripts/llm-generate-description.ts (1)

191-205: Optional: Add small progress logs to match the PR’s usage examples.

Improves UX parity (“Generating…”, “Generated successfully (N characters)”).

-  const resolvedPath = path.resolve(filePathArg);
-  const { description, characterCount, skipped } =
+  const resolvedPath = path.resolve(filePathArg);
+  console.log("Generating description...");
+  const { description, characterCount, skipped } =
     await generateDescriptionForDocsPage(resolvedPath, { skipIfExists });
 
   if (skipped) {
     console.log("✅ File skipped - description already exists");
     return;
   }
 
+  console.log(`Description generated successfully (${characterCount} characters)`);
   console.log(
     `================ DESCRIPTION START (character count: ${characterCount}) =================`,
   );
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ac69b2 and ea9b789.

📒 Files selected for processing (1)
  • scripts/llm-generate-description.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: tests
  • GitHub Check: build
🔇 Additional comments (1)
scripts/llm-generate-description.ts (1)

98-117: LGTM: simple, robust Markdoc frontmatter + raw body extraction.

Using raw Markdoc body avoids JSDOM/renderer complexity and is fine for LLM context. Error wrapping is clear.

Comment on lines +24 to +55
async function generateDescription({
articleText,
frontmatterAttributes,
}: {
articleText: string;
frontmatterAttributes: Record<
string,
string | number | boolean | Date | string[] | number[] | boolean[]
>;
}) {
const systemPrompt = `You are an expert at writing SEO-optimized page descriptions for technical documentation websites targeting senior software engineers.

Generate a concise, professional description (maximum 250 characters) that:
- Accurately summarizes the technical content
- Uses natural language with standard punctuation (use regular hyphens, not em dashes)
- Speaks directly to experienced developers and engineering leaders
- Includes relevant technical keywords for SEO
- Avoids AI-generated language patterns or marketing fluff
- Uses a professional, authoritative tone that resonates with senior engineers

The description should be suitable for use in HTML meta descriptions and social media previews.`;

const userPrompt = `Generate a page description for this documentation page:

Title: ${frontmatterAttributes?.title || "Untitled"}
Summary: ${frontmatterAttributes?.summary || "No summary provided"}

Content:
${articleText}

Generate a description that captures the essence of this page in 250 characters or less.`;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enforce true ≤250-character limit: normalize, count code points, and clamp after retry.

Prevents Unicode miscounts and returning >250 chars even after retry.

 async function generateDescription({
   articleText,
   frontmatterAttributes,
 }: {
   articleText: string;
   frontmatterAttributes: Record<
     string,
     string | number | boolean | Date | string[] | number[] | boolean[]
   >;
 }) {
-  const systemPrompt = `You are an expert at writing SEO-optimized page descriptions for technical documentation websites targeting senior software engineers.
+  const MAX_CHARS = 250;
+  const systemPrompt = `You are an expert at writing SEO-optimized page descriptions for technical documentation websites targeting senior software engineers.
 
-Generate a concise, professional description (maximum 250 characters) that:
+Generate a concise, professional description (maximum ${MAX_CHARS} characters) that:
 - Accurately summarizes the technical content
 - Uses natural language with standard punctuation (use regular hyphens, not em dashes)
 - Speaks directly to experienced developers and engineering leaders
 - Includes relevant technical keywords for SEO
 - Avoids AI-generated language patterns or marketing fluff
 - Uses a professional, authoritative tone that resonates with senior engineers
 
 The description should be suitable for use in HTML meta descriptions and social media previews.`;
 
   const userPrompt = `Generate a page description for this documentation page:
 
 Title: ${frontmatterAttributes?.title || "Untitled"}
 Summary: ${frontmatterAttributes?.summary || "No summary provided"}
 
 Content:
 ${articleText}
 
-Generate a description that captures the essence of this page in 250 characters or less.`;
+Generate a description that captures the essence of this page in ${MAX_CHARS} characters or less.`;
 
   try {
     const { text: description } = await generateText({
       model: openai("gpt-4o-mini"),
       system: systemPrompt,
       prompt: userPrompt,
       maxTokens: 100,
     });
 
-    const trimmedDescription = description.trim();
-    const characterCount = trimmedDescription.length;
+    const normalized = description.replace(/\s+/g, " ").trim();
+    const characterCount = Array.from(normalized).length;
 
     // If the description is too long, try again with a more specific prompt
-    if (characterCount > 250) {
-      const retryPrompt = `The previous description was too long (${characterCount} characters). Generate a shorter description (maximum 250 characters) for this page:
+    if (characterCount > MAX_CHARS) {
+      const retryPrompt = `The previous description was too long (${characterCount} characters). Generate a shorter description (maximum ${MAX_CHARS} characters) for this page:
 
 Title: ${frontmatterAttributes?.title || "Untitled"}
 Content: ${articleText.substring(0, 500)}...
 
-Make it concise and under 250 characters.`;
+Make it concise and under ${MAX_CHARS} characters.`;
 
       const { text: retryDescription } = await generateText({
         model: openai("gpt-4o-mini"),
         system: systemPrompt,
         prompt: retryPrompt,
         maxTokens: 80,
       });
 
-      const finalDescription = retryDescription.trim();
-      return {
-        description: finalDescription,
-        characterCount: finalDescription.length,
-      };
+      let final = retryDescription.replace(/\s+/g, " ").trim();
+      let finalCount = Array.from(final).length;
+      if (finalCount > MAX_CHARS) {
+        final = Array.from(final).slice(0, MAX_CHARS).join("");
+        finalCount = MAX_CHARS;
+      }
+      return { description: final, characterCount: finalCount };
     }
 
-    return { description: trimmedDescription, characterCount };
+    return { description: normalized, characterCount };
   } catch (error) {
     throw new Error(
       `Failed to generate description: ${error instanceof Error ? error.message : "Unknown error"}`,
     );
   }
 }

Also applies to: 56-91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants