Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 18 additions & 18 deletions apps/web/client/public/onlook-preload-script.js

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@ import {
WEB_SEARCH_TOOL_NAME,
type WEB_SEARCH_TOOL_PARAMETERS,
WRITE_FILE_TOOL_NAME,
type WRITE_FILE_TOOL_PARAMETERS
type WRITE_FILE_TOOL_PARAMETERS,
CLONE_WEBSITE_TOOL_NAME,
type CLONE_WEBSITE_TOOL_PARAMETERS,
} from '@onlook/ai';
import { Icons } from '@onlook/ui/icons';
import { cn } from '@onlook/ui/utils';
Expand Down Expand Up @@ -60,6 +62,7 @@ const TOOL_ICONS: Record<string, any> = {
[TYPECHECK_TOOL_NAME]: Icons.MagnifyingGlass,
[LIST_BRANCHES_TOOL_NAME]: Icons.Commit,
[GLOB_TOOL_NAME]: Icons.MagnifyingGlass,
[CLONE_WEBSITE_TOOL_NAME]: Icons.Globe,
} as const;

function truncateString(str: string, maxLength: number = 30) {
Expand Down Expand Up @@ -192,6 +195,18 @@ export function ToolCallSimple({
return 'Reading Onlook instructions';
case TYPECHECK_TOOL_NAME:
return 'Checking types';
case CLONE_WEBSITE_TOOL_NAME:
const params13 = toolInvocation.input as z.infer<typeof CLONE_WEBSITE_TOOL_PARAMETERS>;
if (params13?.url) {
try {
const url = new URL(params13.url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the clone_website case, the URL parsing logic is duplicated. Consider refactoring it into a utility function for consistency and maintainability.

return 'Cloning ' + (url.hostname || 'website');
} catch (error) {
return 'Cloning website';
}
} else {
return 'Cloning website';
}
Comment on lines +198 to +209
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the switch case declaration issue.

The linter correctly identifies that params13 can be accessed by other switch cases. Wrap the declaration in a block.

 case CLONE_WEBSITE_TOOL_NAME:
+    {
     const params13 = toolInvocation.input as z.infer<typeof CLONE_WEBSITE_TOOL_PARAMETERS>;
     if (params13?.url) {
         try {
             const url = new URL(params13.url);
             return 'Cloning ' + (url.hostname || 'website');
         } catch (error) {
             return 'Cloning website';
         }
     } else {
         return 'Cloning website';
     }
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case CLONE_WEBSITE_TOOL_NAME:
const params13 = toolInvocation.input as z.infer<typeof CLONE_WEBSITE_TOOL_PARAMETERS>;
if (params13?.url) {
try {
const url = new URL(params13.url);
return 'Cloning ' + (url.hostname || 'website');
} catch (error) {
return 'Cloning website';
}
} else {
return 'Cloning website';
}
case CLONE_WEBSITE_TOOL_NAME: {
const params13 = toolInvocation.input as z.infer<typeof CLONE_WEBSITE_TOOL_PARAMETERS>;
if (params13?.url) {
try {
const url = new URL(params13.url);
return 'Cloning ' + (url.hostname || 'website');
} catch (error) {
return 'Cloning website';
}
} else {
return 'Cloning website';
}
}
🧰 Tools
🪛 Biome (2.1.2)

[error] 199-199: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🤖 Prompt for AI Agents
In
apps/web/client/src/app/project/[id]/_components/right-panel/chat-tab/chat-messages/message-content/tool-call-simple.tsx
around lines 198 to 209, the const params13 declared inside the
CLONE_WEBSITE_TOOL_NAME case is leaking into other switch cases; wrap the case
body in its own block so the declaration is scoped locally (i.e., add an opening
{ immediately after the case label and a closing } before the case's
return/break), keep the existing try/catch/returns unchanged, and ensure no
extra fall-through occurs.

default:
return toolName?.replace(/[-_]/g, ' ').replace(/\b\w/g, c => c.toUpperCase());
}
Expand Down
79 changes: 77 additions & 2 deletions apps/web/client/src/components/tools/handlers/web.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
import { api } from '@/trpc/client';
import {
type SCRAPE_URL_TOOL_PARAMETERS,
type WEB_SEARCH_TOOL_PARAMETERS
type WEB_SEARCH_TOOL_PARAMETERS,
type CLONE_WEBSITE_TOOL_PARAMETERS
} from '@onlook/ai';
import type { WebSearchResult } from '@onlook/models';
import type { CloneWebsiteResult, WebSearchResult } from '@onlook/models';
import { type z } from 'zod';

export async function handleScrapeUrlTool(
Expand Down Expand Up @@ -48,3 +49,77 @@ export async function handleWebSearchTool(
};
}
}

export async function handleCloneWebsiteTool(
args: z.infer<typeof CLONE_WEBSITE_TOOL_PARAMETERS>,
editorEngine: any,
): Promise<CloneWebsiteResult> {
// Store args in function scope for error handling
const requestUrl = args.url;
const branchId = args.branchId;


try {
const result = await api.code.cloneWebsite.mutate({
url: requestUrl,
});

if (!result.result) {
throw new Error(result.error || 'Failed to clone website');
}

const { markdown, html, designScreenshot, designDocument, assets } = result.result;

// Download assets into public/cloned-assets/
const baseDir = `public/cloned-assets/`;
const sandbox = editorEngine.branches.getSandboxById(branchId);
if (!sandbox) {
Comment on lines +74 to +76
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make asset directory branch-scoped and quote paths

Avoid cross-branch collisions and ensure safe path handling.

-        // Download assets into public/cloned-assets/
-        const baseDir = `public/cloned-assets/`;
+        // Download assets into public/cloned-assets/<branchId>/
+        const baseDir = `public/cloned-assets/${branchId}/`;
@@
-        await sandbox.session.runCommand(`mkdir -p ${baseDir}`);
+        await sandbox.session.runCommand(`mkdir -p "${baseDir}"`);

Also applies to: 89-89

🤖 Prompt for AI Agents
In apps/web/client/src/components/tools/handlers/web.ts around lines 74-76 (and
also at line 89), the asset directory is not branch-scoped and paths are
constructed unsafely; update baseDir to be branch-scoped (e.g. include a
sanitized/validated branchId segment such as public/cloned-assets/<branchId>/)
and replace raw string concatenation with safe path construction (use path.join
or equivalent) to prevent path traversal and collisions; ensure any path
segments are properly quoted/escaped when used in shell or file APIs and
validate/sanitize branchId to allow only safe characters.

console.warn('Sandbox not found for branch ID:', branchId);
return {
result: {
markdown: markdown,
html: html,
designScreenshot: designScreenshot,
designDocument: designDocument,
assets: assets,
},
error: null,
};
}
await sandbox.session.runCommand(`mkdir -p ${baseDir}`);

for (const asset of assets) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider parallelizing the asset downloads (e.g. using Promise.all) to improve performance when there are many assets.


const rawBase = asset.title;
const safeBase = rawBase.replace(/\s+/g, '-').replace(/[^a-zA-Z0-9._-]/g, '-');
const filename = safeBase + '.png';
const dest = `${baseDir}${filename}`;
try {
const download = await sandbox.session.runCommand(`curl -L --silent --fail --show-error "${asset.url}" -o "${dest}"`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of string interpolation in the curl command may pose a shell injection risk. Ensure that asset.url and dest are properly sanitized or escaped.

if (download.success) {
// Attach saved location (relative public path) for UI usage
(asset as any).fileLocation = dest;
} else {
console.log('download failed', download.error);
console.warn(`Failed to download asset ${asset.url}:`, download.error);
}
Comment on lines +91 to +105
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Critical: command injection & SSRF risk in curl; also fix forced .png extension and return a web path

  • Interpolating an untrusted URL inside double quotes allows command substitution (e.g., $(...)) → RCE.
  • No scheme validation → SSRF vector (internal addresses, file://, etc.).
  • All assets saved as .png → broken asset types.
  • fileLocation should be a public web path (strip leading public/).
-        for (const asset of assets) {
-            
-            const rawBase = asset.title;
-            const safeBase = rawBase.replace(/\s+/g, '-').replace(/[^a-zA-Z0-9._-]/g, '-');
-            const filename = safeBase + '.png';
-            const dest = `${baseDir}${filename}`;
-            try {
-                const download = await sandbox.session.runCommand(`curl -L --silent --fail --show-error "${asset.url}" -o "${dest}"`);
-                if (download.success) {
-                    // Attach saved location (relative public path) for UI usage
-                    (asset as any).fileLocation = dest;
-                } else {
-                    console.log('download failed', download.error);
-                    console.warn(`Failed to download asset ${asset.url}:`, download.error);
-                }
-            } catch (error) {
-                console.warn(`Failed to download asset ${asset.url}:`, error);
-            }
-        }
+        for (const asset of assets) {
+            // Validate and normalize URL
+            let parsed: URL;
+            try {
+                parsed = new URL(asset.url);
+            } catch {
+                console.warn('Skipping asset with invalid URL:', asset?.url);
+                continue;
+            }
+            if (!/^https?:$/.test(parsed.protocol)) {
+                console.warn('Skipping non-http(s) asset:', asset.url);
+                continue;
+            }
+
+            // Safe filename (bounded length)
+            const rawBase = (asset.title || parsed.pathname.split('/').pop() || 'asset').slice(0, 64);
+            const safeBase = rawBase.replace(/\s+/g, '-').replace(/[^a-zA-Z0-9._-]/g, '-');
+            const last = parsed.pathname.split('/').pop() || '';
+            const idx = last.lastIndexOf('.');
+            const extFromPath = idx > -1 ? last.slice(idx + 1) : '';
+            const ext = /^[a-zA-Z0-9]+$/.test(extFromPath) ? extFromPath : 'bin';
+            const filename = `${safeBase}.${ext}`;
+            const dest = `${baseDir}${filename}`;
+
+            // Shell-safe quoting: single-quote and escape existing single quotes
+            const quotedUrl = `'${asset.url.replace(/'/g, `'\\''`)}'`;
+            const quotedDest = `'${dest.replace(/'/g, `'\\''`)}'`;
+            const cmd = `curl -L --silent --fail --show-error --connect-timeout 5 --max-time 30 --proto '=http,https' --proto-redir '=http,https' ${quotedUrl} -o ${quotedDest}`;
+
+            try {
+                const download = await sandbox.session.runCommand(cmd);
+                if (download.success) {
+                    // Attach served location (strip "public/")
+                    const webPath = dest.startsWith('public/') ? `/${dest.slice('public/'.length)}` : dest;
+                    (asset as any).fileLocation = webPath;
+                } else {
+                    console.warn(`Failed to download asset ${asset.url}:`, download.error);
+                }
+            } catch (error) {
+                console.warn(`Failed to download asset ${asset.url}:`, error);
+            }
+        }

Follow-ups:

  • Consider deduping filenames (e.g., suffix collisions or hash by URL).
  • If assets can include CSS/JS/fonts, optionally restrict extensions to an allowlist and skip others.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In apps/web/client/src/components/tools/handlers/web.ts around lines 91-105, the
code currently interpolates untrusted asset.url into a shell command (risking
command injection and SSRF), forces a .png extension, and stores a file system
path instead of a public web path; to fix, stop invoking curl via shell
interpolation and instead fetch assets using a safe HTTP client or spawn a
subprocess with arguments (no shell), validate and allow only safe schemes
(http/https) and disallow internal/loopback IPs/hostnames to mitigate SSRF,
determine the correct extension from the response Content-Type or the URL path
(and apply an allowlist if needed), ensure filenames are sanitized and
deduplicated (e.g., append a short hash of the URL), write to the filesystem
under the public dir, then set fileLocation to the web-accessible path by
stripping the leading public/ prefix.

} catch (error) {
console.warn(`Failed to download asset ${asset.url}:`, error);
}
}

return {
result: {
markdown: markdown,
html: html,
designScreenshot: designScreenshot,
designDocument: designDocument,
assets: assets,
},
error: null,
};
} catch (error) {
console.error('Error cloning website:', error);
throw new Error(`Failed to clone website ${requestUrl}: ${error instanceof Error ? error.message : 'Unknown error'}`);
}
}
13 changes: 11 additions & 2 deletions apps/web/client/src/components/tools/tools.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ import {
WEB_SEARCH_TOOL_NAME,
WEB_SEARCH_TOOL_PARAMETERS,
WRITE_FILE_TOOL_NAME,
WRITE_FILE_TOOL_PARAMETERS
WRITE_FILE_TOOL_PARAMETERS,
CLONE_WEBSITE_TOOL_NAME,
CLONE_WEBSITE_TOOL_PARAMETERS,
} from '@onlook/ai';
import { toast } from '@onlook/ui/sonner';
import { type z } from 'zod';
Expand All @@ -56,7 +58,8 @@ import {
handleTerminalCommandTool,
handleTypecheckTool,
handleWebSearchTool,
handleWriteFileTool
handleWriteFileTool,
handleCloneWebsiteTool,
} from './handlers';
import { EMPTY_TOOL_PARAMETERS } from './helpers';

Expand Down Expand Up @@ -174,6 +177,12 @@ const TOOL_HANDLERS: ClientToolMap = {
handler: async (args: z.infer<typeof CHECK_ERRORS_TOOL_PARAMETERS>, editorEngine: EditorEngine) =>
handleCheckErrors(args, editorEngine),
},
[CLONE_WEBSITE_TOOL_NAME]: {
name: CLONE_WEBSITE_TOOL_NAME,
inputSchema: CLONE_WEBSITE_TOOL_PARAMETERS,
handler: async (args: z.infer<typeof CLONE_WEBSITE_TOOL_PARAMETERS>, editorEngine: EditorEngine) =>
handleCloneWebsiteTool(args, editorEngine),
},
};

export async function handleToolCall(toolCall: ToolCall<string, unknown>, editorEngine: EditorEngine) {
Expand Down
137 changes: 136 additions & 1 deletion apps/web/client/src/server/api/routers/code.ts
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
import { env } from '@/env';
import FirecrawlApp from '@mendable/firecrawl-js';
import { applyCodeChange } from '@onlook/ai';
import type { WebSearchResult } from '@onlook/models';
import type { WebSearchResult, CloneWebsiteResult } from '@onlook/models';
import Exa from 'exa-js';
import { z } from 'zod';
import { createTRPCRouter, protectedProcedure } from '../trpc';
import { initModel } from '@onlook/ai';
import { LLMProvider, ANTHROPIC_MODELS } from '@onlook/models';
import { generateObject } from 'ai';
import { CLONE_WEBSITE_DESIGN_PROMPT } from '@onlook/ai/src/prompt/clone';
import { DesignSchema } from '@onlook/models/tools';
Comment on lines +11 to +12
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fix the import path for CLONE_WEBSITE_DESIGN_PROMPT.

The import uses a direct path instead of the package export. This could break if the internal structure changes.

-import { CLONE_WEBSITE_DESIGN_PROMPT } from '@onlook/ai/src/prompt/clone';
-import { DesignSchema } from '@onlook/models/tools';
+import { CLONE_WEBSITE_DESIGN_PROMPT } from '@onlook/ai';
+import { DesignSchema } from '@onlook/models';

🏁 Script executed:

#!/bin/bash
# Check if these exports are available from the package roots
rg -n "export.*CLONE_WEBSITE_DESIGN_PROMPT" packages/ai/src/index.ts packages/ai/index.ts
rg -n "export.*DesignSchema" packages/models/src/index.ts packages/models/index.ts

Length of output: 125


🏁 Script executed:

#!/usr/bin/env bash
set -e

echo "Root directories:"
ls -1 .

echo -e "\nApps directory (apps/*):"
ls -1 apps

echo -e "\nPackages directory (packages/*):"
ls -1 packages || echo "No packages directory at root"

echo -e "\nSearching for @onlook/ai in package.json:"
rg -n '"name"\s*:\s*"@onlook/ai"' -g "*/package.json"

echo -e "\nSearching for @onlook/models in package.json:"
rg -n '"name"\s*:\s*"@onlook/models"' -g "*/package.json"

AI_PKG_DIR=$(dirname "$(rg -l '"name"\s*:\s*"@onlook/ai"' -g "*/package.json" | head -n1)")
MODELS_PKG_DIR=$(dirname "$(rg -l '"name"\s*:\s*"@onlook/models"' -g "*/package.json" | head -n1)")

echo -e "\nAI package directory: $AI_PKG_DIR"
echo "Models package directory: $MODELS_PKG_DIR"

echo -e "\nListing index files in AI package:"
find "$AI_PKG_DIR" -type f \( -name "index.ts" -o -name "src/index.ts" \)

echo -e "\nListing index files in Models package:"
find "$MODELS_PKG_DIR" -type f \( -name "index.ts" -o -name "src/index.ts" \)

echo -e "\nSearching for export of CLONE_WEBSITE_DESIGN_PROMPT in AI index file(s):"
rg -n "export.*CLONE_WEBSITE_DESIGN_PROMPT" $(find "$AI_PKG_DIR" -type f \( -name "index.ts" -o -name "src/index.ts" \) )

echo -e "\nSearching for export of DesignSchema in Models index file(s):"
rg -n "export.*DesignSchema" $(find "$MODELS_PKG_DIR" -type f \( -name "index.ts" -o -name "src/index.ts" \) )

Length of output: 440


🏁 Script executed:

#!/usr/bin/env bash
set -e

echo "=== packages/ai/package.json ==="
cat packages/ai/package.json

echo -e "\n=== packages/models/package.json ==="
cat packages/models/package.json

echo -e "\n=== files under packages/ai/src/prompt ==="
if [ -d packages/ai/src/prompt ]; then
  find packages/ai/src/prompt -type f
else
  echo "No packages/ai/src/prompt directory"
fi

echo -e "\n=== definition of CLONE_WEBSITE_DESIGN_PROMPT in ai package ==="
rg -n "CLONE_WEBSITE_DESIGN_PROMPT" -g "packages/ai/**" || true

echo -e "\n=== packages/ai/src/index.ts (first 200 lines) ==="
if [ -f packages/ai/src/index.ts ]; then
  sed -n '1,200p' packages/ai/src/index.ts
else
  echo "packages/ai/src/index.ts not found"
fi

echo -e "\n=== packages/ai/index.ts (first 200 lines) ==="
if [ -f packages/ai/index.ts ]; then
  sed -n '1,200p' packages/ai/index.ts
else
  echo "packages/ai/index.ts not found"
fi

echo -e "\n=== files under packages/models/src/tools ==="
if [ -d packages/models/src/tools ]; then
  find packages/models/src/tools -type f
else
  echo "No packages/models/src/tools directory"
fi

echo -e "\n=== definition of DesignSchema in models package ==="
rg -n "DesignSchema" -g "packages/models/**" || true

echo -e "\n=== packages/models/src/index.ts (first 200 lines) ==="
if [ -f packages/models/src/index.ts ]; then
  sed -n '1,200p' packages/models/src/index.ts
else
  echo "packages/models/src/index.ts not found"
fi

echo -e "\n=== packages/models/index.ts (first 200 lines) ==="
if [ -f packages/models/index.ts ]; then
  sed -n '1,200p' packages/models/index.ts
else
  echo "packages/models/index.ts not found"
fi

Length of output: 4454


Use package root exports instead of deep imports

-import { CLONE_WEBSITE_DESIGN_PROMPT } from '@onlook/ai/src/prompt/clone';
-import { DesignSchema } from '@onlook/models/tools';
+import { CLONE_WEBSITE_DESIGN_PROMPT } from '@onlook/ai';
+import { DesignSchema } from '@onlook/models';

Benefits: guards against internal restructuring and aligns with package export contracts.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import { CLONE_WEBSITE_DESIGN_PROMPT } from '@onlook/ai/src/prompt/clone';
import { DesignSchema } from '@onlook/models/tools';
import { CLONE_WEBSITE_DESIGN_PROMPT } from '@onlook/ai';
import { DesignSchema } from '@onlook/models';
🤖 Prompt for AI Agents
In apps/web/client/src/server/api/routers/code.ts around lines 11-12 the file
uses deep imports ('@onlook/ai/src/prompt/clone' and '@onlook/models/tools');
change these to use the package root exports (e.g., import
CLONE_WEBSITE_DESIGN_PROMPT from '@onlook/ai' and DesignSchema from
'@onlook/models') so consumers rely on the package export surface instead of
internal paths, and update any named vs default import shape to match the
package root exports.



export const codeRouter = createTRPCRouter({
applyDiff: protectedProcedure
Expand Down Expand Up @@ -148,4 +154,133 @@ export const codeRouter = createTRPCRouter({
};
}
}),

cloneWebsite: protectedProcedure
.input(z.object({
url: z.string().url(),
}))
.mutation(async ({ input }): Promise<CloneWebsiteResult> => {
try {
if (!env.FIRECRAWL_API_KEY) {
throw new Error('FIRECRAWL_API_KEY is not configured');
}

const app = new FirecrawlApp({ apiKey: env.FIRECRAWL_API_KEY });

// Scrape the website with screenshot to get visual content
const result = await app.scrapeUrl(input.url, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two separate scrapeUrl calls run sequentially. Consider running them concurrently with Promise.all to reduce overall latency.

formats: ['html', 'screenshot@fullPage', 'markdown'],
onlyMainContent: false,
waitFor: 2000,
});
const imageAssetsResult = await app.scrapeUrl(input.url, {
formats: ['markdown'],
onlyMainContent: false,
includeTags: ['img'],
waitFor: 2000,
});
Comment on lines +170 to +181
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add timeout configuration for Firecrawl operations.

The scraping operations use a fixed 2-second wait time which may not be sufficient for all websites. Consider making this configurable or adaptive.

 // Scrape the website with screenshot to get visual content
+const waitTime = Math.min(10000, 2000 + (input.url.length * 10)); // Adaptive wait based on URL complexity, max 10s
 const result = await app.scrapeUrl(input.url, {
     formats: ['html', 'screenshot@fullPage', 'markdown'],
     onlyMainContent: false,
-    waitFor: 2000,
+    waitFor: waitTime,
+    timeout: 30000, // Add overall timeout
 });
 const imageAssetsResult = await app.scrapeUrl(input.url, {
     formats: ['markdown'],
     onlyMainContent: false,
     includeTags: ['img'],
-    waitFor: 2000,
+    waitFor: waitTime,
+    timeout: 30000,
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Scrape the website with screenshot to get visual content
const result = await app.scrapeUrl(input.url, {
formats: ['html', 'screenshot@fullPage', 'markdown'],
onlyMainContent: false,
waitFor: 2000,
});
const imageAssetsResult = await app.scrapeUrl(input.url, {
formats: ['markdown'],
onlyMainContent: false,
includeTags: ['img'],
waitFor: 2000,
});
// Scrape the website with screenshot to get visual content
const waitTime = Math.min(10000, 2000 + (input.url.length * 10)); // Adaptive wait based on URL complexity, max 10s
const result = await app.scrapeUrl(input.url, {
formats: ['html', 'screenshot@fullPage', 'markdown'],
onlyMainContent: false,
waitFor: waitTime,
timeout: 30000, // Add overall timeout
});
const imageAssetsResult = await app.scrapeUrl(input.url, {
formats: ['markdown'],
onlyMainContent: false,
includeTags: ['img'],
waitFor: waitTime,
timeout: 30000,
});
🤖 Prompt for AI Agents
In apps/web/client/src/server/api/routers/code.ts around lines 170 to 181, the
two scrapeUrl calls hardcode waitFor: 2000 which is brittle; make the Firecrawl
timeout configurable and robust by accepting an optional timeout (e.g.,
timeoutMs) from the request input or reading a service-level config/env value,
validate and clamp it to a safe min/max, and pass that value into both scrapeUrl
calls; if no timeout provided fall back to a sensible default (e.g., 2000), and
consider supporting an adaptive strategy (e.g., retry with a larger timeout or
use a networkIdle option if available) so slow pages are handled without
indefinite waits.


if (!result.success) {
throw new Error(`Failed to clone website: ${result.error || 'Unknown error'}`);
}

let imageAssets: {
url: string;
title: string;
}[] = [];
if ('success' in imageAssetsResult && imageAssetsResult.success && imageAssetsResult.markdown) {
const md = imageAssetsResult.markdown;
const mdImgRegex = /!\[([^\]]*)\]\(([^)]+)\)/g;
let match: RegExpExecArray | null;
while ((match = mdImgRegex.exec(md)) !== null) {
const alt = (match[1] || '').trim();
const urlCandidate = match[2];
const title = alt ? alt.replace(/\s+/g, '-') : '';
if (!urlCandidate) continue;
try {
const absoluteUrl = new URL(urlCandidate, input.url).toString();
imageAssets.push({ url: absoluteUrl, title });
} catch {
imageAssets.push({ url: urlCandidate, title });
}
}
} else if (result.html) {
// Fallback: parse from HTML if markdown not available
const imgTagRegex = /<img[^>]*>/gi;
const srcRegex = /src=["']([^"']+)["']/i;
const altRegex = /alt=["']([^"']*)["']/i;
let tagMatch: RegExpExecArray | null;
while ((tagMatch = imgTagRegex.exec(result.html)) !== null) {
const tag = tagMatch[0];
const srcMatch = srcRegex.exec(tag);
if (!srcMatch) continue;
const srcCandidate = srcMatch[1] ?? '';
const altMatch = altRegex.exec(tag);
const alt = (altMatch?.[1] ?? '').trim();
const title = alt ? alt.replace(/\s+/g, '-') : '';
try {
const absoluteUrl = new URL(srcCandidate, input.url).toString();
imageAssets.push({ url: absoluteUrl, title });
} catch {
imageAssets.push({ url: srcCandidate, title });
}
}
}


// Dedupe by URL

const byUrl = new Map<string, { url: string; title: string }>();
for (const asset of imageAssets) {
if (!byUrl.has(asset.url)) {
byUrl.set(asset.url, asset);
}
}
imageAssets = Array.from(byUrl.values());

Comment on lines +187 to +240
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve asset extraction robustness and add validation.

The current implementation has several areas for improvement:

  1. No validation of image URLs (could be data URIs, malformed, etc.)
  2. Missing error handling for invalid URLs
  3. No size or type filtering for assets
 let imageAssets: {
     url: string;
     title: string;
 }[] = [];
+
+// Helper to validate and normalize image URLs
+const isValidImageUrl = (url: string): boolean => {
+    if (!url || url.startsWith('data:')) return false;
+    try {
+        const parsed = new URL(url);
+        return ['http:', 'https:'].includes(parsed.protocol);
+    } catch {
+        return false;
+    }
+};
+
 if ('success' in imageAssetsResult && imageAssetsResult.success && imageAssetsResult.markdown) {
     const md = imageAssetsResult.markdown;
     const mdImgRegex = /!\[([^\]]*)\]\(([^)]+)\)/g;
     let match: RegExpExecArray | null;
     while ((match = mdImgRegex.exec(md)) !== null) {
         const alt = (match[1] || '').trim();
         const urlCandidate = match[2];
         const title = alt ? alt.replace(/\s+/g, '-') : '';
         if (!urlCandidate) continue;
         try {
             const absoluteUrl = new URL(urlCandidate, input.url).toString();
-            imageAssets.push({ url: absoluteUrl, title });
+            if (isValidImageUrl(absoluteUrl)) {
+                imageAssets.push({ url: absoluteUrl, title });
+            }
         } catch {
-            imageAssets.push({ url: urlCandidate, title });
+            // Skip invalid URLs
         }
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let imageAssets: {
url: string;
title: string;
}[] = [];
if ('success' in imageAssetsResult && imageAssetsResult.success && imageAssetsResult.markdown) {
const md = imageAssetsResult.markdown;
const mdImgRegex = /!\[([^\]]*)\]\(([^)]+)\)/g;
let match: RegExpExecArray | null;
while ((match = mdImgRegex.exec(md)) !== null) {
const alt = (match[1] || '').trim();
const urlCandidate = match[2];
const title = alt ? alt.replace(/\s+/g, '-') : '';
if (!urlCandidate) continue;
try {
const absoluteUrl = new URL(urlCandidate, input.url).toString();
imageAssets.push({ url: absoluteUrl, title });
} catch {
imageAssets.push({ url: urlCandidate, title });
}
}
} else if (result.html) {
// Fallback: parse from HTML if markdown not available
const imgTagRegex = /<img[^>]*>/gi;
const srcRegex = /src=["']([^"']+)["']/i;
const altRegex = /alt=["']([^"']*)["']/i;
let tagMatch: RegExpExecArray | null;
while ((tagMatch = imgTagRegex.exec(result.html)) !== null) {
const tag = tagMatch[0];
const srcMatch = srcRegex.exec(tag);
if (!srcMatch) continue;
const srcCandidate = srcMatch[1] ?? '';
const altMatch = altRegex.exec(tag);
const alt = (altMatch?.[1] ?? '').trim();
const title = alt ? alt.replace(/\s+/g, '-') : '';
try {
const absoluteUrl = new URL(srcCandidate, input.url).toString();
imageAssets.push({ url: absoluteUrl, title });
} catch {
imageAssets.push({ url: srcCandidate, title });
}
}
}
// Dedupe by URL
const byUrl = new Map<string, { url: string; title: string }>();
for (const asset of imageAssets) {
if (!byUrl.has(asset.url)) {
byUrl.set(asset.url, asset);
}
}
imageAssets = Array.from(byUrl.values());
let imageAssets: {
url: string;
title: string;
}[] = [];
// Helper to validate and normalize image URLs
const isValidImageUrl = (url: string): boolean => {
if (!url || url.startsWith('data:')) return false;
try {
const parsed = new URL(url);
return ['http:', 'https:'].includes(parsed.protocol);
} catch {
return false;
}
};
if ('success' in imageAssetsResult && imageAssetsResult.success && imageAssetsResult.markdown) {
const md = imageAssetsResult.markdown;
const mdImgRegex = /!\[([^\]]*)\]\(([^)]+)\)/g;
let match: RegExpExecArray | null;
while ((match = mdImgRegex.exec(md)) !== null) {
const alt = (match[1] || '').trim();
const urlCandidate = match[2];
const title = alt ? alt.replace(/\s+/g, '-') : '';
if (!urlCandidate) continue;
try {
const absoluteUrl = new URL(urlCandidate, input.url).toString();
if (isValidImageUrl(absoluteUrl)) {
imageAssets.push({ url: absoluteUrl, title });
}
} catch {
// Skip invalid URLs
}
}
} else if (result.html) {
// Fallback: parse from HTML if markdown not available
const imgTagRegex = /<img[^>]*>/gi;
const srcRegex = /src=["']([^"']+)["']/i;
const altRegex = /alt=["']([^"']*)["']/i;
let tagMatch: RegExpExecArray | null;
while ((tagMatch = imgTagRegex.exec(result.html)) !== null) {
const tag = tagMatch[0];
const srcMatch = srcRegex.exec(tag);
if (!srcMatch) continue;
const srcCandidate = srcMatch[1] ?? '';
const altMatch = altRegex.exec(tag);
const alt = (altMatch?.[1] ?? '').trim();
const title = alt ? alt.replace(/\s+/g, '-') : '';
try {
const absoluteUrl = new URL(srcCandidate, input.url).toString();
imageAssets.push({ url: absoluteUrl, title });
} catch {
imageAssets.push({ url: srcCandidate, title });
}
}
}
// Dedupe by URL
const byUrl = new Map<string, { url: string; title: string }>();
for (const asset of imageAssets) {
if (!byUrl.has(asset.url)) {
byUrl.set(asset.url, asset);
}
}
imageAssets = Array.from(byUrl.values());
🤖 Prompt for AI Agents
In apps/web/client/src/server/api/routers/code.ts around lines 187 to 240, the
image extraction block should validate and filter candidates before adding them:
only accept http(s) absolute URLs (reject data:, mailto:, javascript: etc.),
skip obviously malformed URLs by catching URL construction errors (already
present) and continue; perform a HEAD (or fetch with method: 'HEAD') to check
Content-Type starts with image/ and Content-Length (or streamed size) is below a
configured limit (e.g., 5MB) with a short timeout, and skip on any
network/timeout/error; also whitelist common image extensions as a cheap
pre-check before fetching; keep existing dedupe logic but apply it after
filtering; ensure all network calls are try/catch and do not throw to the
caller, logging or silently skipping invalid assets.


const { model, headers } = await initModel({
provider: LLMProvider.ANTHROPIC,
model: ANTHROPIC_MODELS.SONNET_4,
});

const { object } = await generateObject({
model,
headers,
schema: DesignSchema,
messages: [
{
role: 'system',
content: CLONE_WEBSITE_DESIGN_PROMPT,
},
{
role: 'user',
content: `HTML: ${result.html}
Markdown: ${result.markdown}
Screenshot: ${result.screenshot}`,
},
],
maxOutputTokens: 10000,
});
Comment on lines +242 to +264
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for AI model initialization and generation.

The design document generation lacks proper error handling and could fail silently.

+let designDocument: z.infer<typeof DesignSchema> | null = null;
+
+try {
     const { model, headers } = await initModel({
         provider: LLMProvider.ANTHROPIC,
         model: ANTHROPIC_MODELS.SONNET_4,
     });
 
     const { object } = await generateObject({
         model,
         headers,
         schema: DesignSchema,
         messages: [
             {
                 role: 'system',
                 content: CLONE_WEBSITE_DESIGN_PROMPT,
             },
             {
                 role: 'user',
                 content: `HTML: ${result.html}
                 Markdown: ${result.markdown}
                 Screenshot: ${result.screenshot}`,
             },
         ],
         maxOutputTokens: 10000,
     });
 
-    const designDocument: z.infer<typeof DesignSchema> = object;
+    designDocument = object;
+} catch (error) {
+    console.error('Failed to generate design document:', error);
+    // Continue without design document rather than failing entire operation
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const { model, headers } = await initModel({
provider: LLMProvider.ANTHROPIC,
model: ANTHROPIC_MODELS.SONNET_4,
});
const { object } = await generateObject({
model,
headers,
schema: DesignSchema,
messages: [
{
role: 'system',
content: CLONE_WEBSITE_DESIGN_PROMPT,
},
{
role: 'user',
content: `HTML: ${result.html}
Markdown: ${result.markdown}
Screenshot: ${result.screenshot}`,
},
],
maxOutputTokens: 10000,
});
let designDocument: z.infer<typeof DesignSchema> | null = null;
try {
const { model, headers } = await initModel({
provider: LLMProvider.ANTHROPIC,
model: ANTHROPIC_MODELS.SONNET_4,
});
const { object } = await generateObject({
model,
headers,
schema: DesignSchema,
messages: [
{
role: 'system',
content: CLONE_WEBSITE_DESIGN_PROMPT,
},
{
role: 'user',
content: `HTML: ${result.html}
Markdown: ${result.markdown}
Screenshot: ${result.screenshot}`,
},
],
maxOutputTokens: 10000,
});
designDocument = object;
} catch (error) {
console.error('Failed to generate design document:', error);
// Continue without design document rather than failing entire operation
}
🤖 Prompt for AI Agents
In apps/web/client/src/server/api/routers/code.ts around lines 242 to 264, the
calls to initModel and generateObject lack error handling and may fail silently;
wrap the initModel and generateObject calls in a try/catch (or separate
try/catches) so failures are caught, log the error with contextual information
(which call failed and relevant inputs), and propagate a clear failure to the
caller (e.g., throw a TRPC/HTTP error with a 500 and concise message or return
an error payload) instead of letting the function continue silently.


const designDocument: z.infer<typeof DesignSchema> = object;

return {
result: {
markdown: result.markdown || '',
html: result.html || '',
designScreenshot: result.screenshot || '',
designDocument: designDocument,
assets: imageAssets,
},
error: null,
};
Comment on lines +268 to +277
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add null checks for optional Firecrawl results.

The code assumes all fields from Firecrawl will be present, but they might be undefined.

 return {
     result: {
-        markdown: result.markdown || '',
-        html: result.html || '',
-        designScreenshot: result.screenshot || '',
+        markdown: result.markdown ?? '',
+        html: result.html ?? '',
+        designScreenshot: result.screenshot ?? '',
         designDocument: designDocument,
         assets: imageAssets,
     },
     error: null,
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return {
result: {
markdown: result.markdown || '',
html: result.html || '',
designScreenshot: result.screenshot || '',
designDocument: designDocument,
assets: imageAssets,
},
error: null,
};
return {
result: {
markdown: result.markdown ?? '',
html: result.html ?? '',
designScreenshot: result.screenshot ?? '',
designDocument: designDocument,
assets: imageAssets,
},
error: null,
};
🤖 Prompt for AI Agents
In apps/web/client/src/server/api/routers/code.ts around lines 268-277, the code
assumes all fields returned from Firecrawl exist; add null/undefined checks and
safe defaults when constructing the response so missing fields won't throw.
Coalesce result.markdown and result.html to empty strings if undefined, coalesce
result.screenshot to empty string, ensure designDocument is set to null or an
empty object if missing, and ensure imageAssets is an empty array if undefined;
update any typing if necessary so the response always has those safe defaults.

} catch (error) {
console.error('Error cloning website:', error);
return {
error: error instanceof Error ? error.message : 'Unknown error',
result: null,
};
}
}),
});
11 changes: 11 additions & 0 deletions packages/ai/src/prompt/clone.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
export const CLONE_WEBSITE_DESIGN_PROMPT = `You are an expert web designer and UX specialist.

Given the HTML, Markdown, and a screenshot of a web page, analyze the entire page from top to bottom, starting at the very top of the screenshot and continuing all the way to the bottom. Do not miss any section—your goal is to create a complete and exhaustive design document that is as accurate as possible, down to every single pixel.

Break down the page into a dynamic list of sections, ordered from top to bottom as they appear visually. For each section, provide:
- "type": the section type (e.g., "navBar", "hero", "footer", "sidebar", etc.)
- "description": a highly accurate, detailed explanation of the section's content, purpose, and visual appearance. Be specific about layout, spacing, alignment, colors, typography, and any unique style details. Ensure your description is as precise as possible and reflects the exact look and feel of the section, with pixel-level accuracy.
- "styles": a concise summary of the key CSS styles or visual properties that define this section (e.g., background color, font size, padding, margin, border, flex/grid usage, etc.). Focus on what makes the section pixel perfect.


Return your analysis as a JSON object with a "sections" array. Do not include any other text or commentary. Only return the JSON object.`;
14 changes: 13 additions & 1 deletion packages/ai/src/prompt/create.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,16 @@
export const CREATE_NEW_PAGE_SYSTEM_PROMPT = `IMPORTANT:
- The following is the first user message meant to set up the project from a blank slate.
- You will be given a prompt and optional images. You need to update a Next.js project that matches the prompt.
- Try to use a distinct style and infer it from the prompt. For example, if the prompt is for something artistic, you should make this look distinct based on the intent.`;
- Try to use a distinct style and infer it from the prompt. For example, if the prompt is for something artistic, you should make this look distinct based on the intent.
- If the user request satisfies the conditions for using the clone_website tool, call the clone_website tool.


<cloning_instructions>
- Conditions for using the clone_website tool:
- The user request is specifically to clone a website
- The user query explicitly mentions a relevant keyword such as "clone"
- The user query MUST explicitly mentions a concrete website URL. Even if the user request is to clone a website, if the user query does not explicitly mention a concrete website URL, you must ask the user to provide a concrete website URL.
- If the above conditions are met, immediately call the clone_website tool with that website_url
- IMPORTANT: The clone_website tool must be about creating a pixel perfect clone of the website that is related to the original user request.
</cloning_instructions>
`;
12 changes: 12 additions & 0 deletions packages/ai/src/tools/tools/web.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import { tool } from 'ai';
import { z } from 'zod';
import { BRANCH_ID_SCHEMA } from './branch';

export const SCRAPE_URL_TOOL_NAME = 'scrape_url';
export const SCRAPE_URL_TOOL_PARAMETERS = z.object({
Expand Down Expand Up @@ -44,3 +45,14 @@ export const webSearchTool = tool({
description: 'Search the web for up-to-date information',
inputSchema: WEB_SEARCH_TOOL_PARAMETERS,
});

export const CLONE_WEBSITE_TOOL_NAME = 'clone_website';
export const CLONE_WEBSITE_TOOL_PARAMETERS = z.object({
url: z.string().url().describe('The URL to clone. Must be a valid HTTP or HTTPS URL.'),
branchId: BRANCH_ID_SCHEMA,
});
export const cloneWebsiteTool = tool({
description:
'Clone a website by scraping its content and returning the HTML, a markdown version, reference screenshot of what the website looks like, reference design document, and a list of assets that you can use. Use these outputs as references to pixel perfect replicate the website’s design and layout.',
inputSchema: CLONE_WEBSITE_TOOL_PARAMETERS,
});
3 changes: 3 additions & 0 deletions packages/ai/src/tools/toolset.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ import {
webSearchTool,
WRITE_FILE_TOOL_NAME,
writeFileTool,
CLONE_WEBSITE_TOOL_NAME,
cloneWebsiteTool,
} from './tools';

export const ASK_TOOL_SET: ToolSet = {
Expand All @@ -61,4 +63,5 @@ export const BUILD_TOOL_SET: ToolSet = {
[SANDBOX_TOOL_NAME]: sandboxTool,
[TERMINAL_COMMAND_TOOL_NAME]: terminalCommandTool,
[TYPECHECK_TOOL_NAME]: typecheckTool,
[CLONE_WEBSITE_TOOL_NAME]: cloneWebsiteTool,
};
Loading