Skip to content

Commit f4d4bdd

Browse files
committed
feat(browser): add file upload capability
Implements the `upload` action in `BrowserSession` to allow uploading files through the Puppeteer-controlled browser. - Handles standard file inputs using `elementHandle.uploadFile()`. - Supports hidden file inputs by temporarily making them visible. - Includes a fallback mechanism using `page.waitForFileChooser()`. - Adds `waitForNetworkIdle` and `checkForUploadSuccessIndicators` heuristics to help determine upload completion. - Includes unit tests (`BrowserSession.upload.test.ts`) and necessary fixtures (`file-upload-test.html`, `test-upload-file.txt`). - Updates shared types (`ExtensionMessage.ts`) and tool documentation (`browser-action.ts`) to include the new action and clarify the absolute path requirement for `filepath`.
1 parent 5c1246d commit f4d4bdd

File tree

3 files changed

+272
-107
lines changed

3 files changed

+272
-107
lines changed

src/core/prompts/tools/browser-action.ts

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ export function getBrowserActionDescription(args: ToolArgs): string | undefined
88
Description: Request to interact with a Puppeteer-controlled browser. Every action, except \`close\`, will be responded to with a screenshot of the browser's current state, along with any new console logs. You may only perform one browser action per message, and wait for the user's response including a screenshot and logs to determine the next action.
99
- The sequence of actions **must always start with** launching the browser at a URL, and **must always end with** closing the browser. If you need to visit a new URL that is not possible to navigate to from the current webpage, you must first close the browser, then launch again at the new URL.
1010
- While the browser is active, only the \`browser_action\` tool can be used. No other tools should be called during this time. You may proceed to use other tools only after closing the browser. For example if you run into an error and need to fix a file, you must close the browser, then use other tools to make the necessary changes, then re-launch the browser to verify the result.
11+
- **IMPORTANT: NEVER click on buttons that would open a system file dialog.** Instead, use the \`upload\` action directly with the file input element's selector and the desired file path.
1112
- The browser window has a resolution of **${args.browserViewportSize}** pixels. When performing any click actions, ensure the coordinates are within this resolution range.
1213
- Before clicking on any elements such as icons, links, or buttons, you must consult the provided screenshot of the page to determine the coordinates of the element. The click should be targeted at the **center of the element**, not on its edges.
1314
Parameters:
@@ -18,13 +19,15 @@ Parameters:
1819
* click: Click at a specific x,y coordinate.
1920
- Use with the \`coordinate\` parameter to specify the location.
2021
- Always click in the center of an element (icon, button, link, etc.) based on coordinates derived from a screenshot.
22+
- **DO NOT click on buttons labeled "Upload", "Choose File", "Browse", or similar that would open system file dialogs.** Use the \`upload\` action instead.
2123
* type: Type a string of text on the keyboard. You might use this after clicking on a text field to input text.
2224
- Use with the \`text\` parameter to provide the string to type.
2325
* scroll_down: Scroll down the page by one page height.
2426
* scroll_up: Scroll up the page by one page height.
25-
* upload: Upload a file to a file input element on the page.
26-
- Use with the \`selector\` parameter to specify the CSS selector for the file input element.
27+
* upload: **UPLOAD FILES DIRECTLY WITH THIS ACTION - DO NOT CLICK ON UPLOAD BUTTONS.** This action uploads a file to a file input element on the page by directly setting its value, bypassing any system file dialogs.
28+
- Use with the \`selector\` parameter to specify the CSS selector for the file input element (typically \`input[type="file"]\`).
2729
- Use with the \`filepath\` parameter to specify the path to the file to upload.
30+
- **IMPORTANT:** Always use this action instead of clicking on buttons that would open system file dialogs.
2831
* close: Close the Puppeteer-controlled browser instance. This **must always be the final browser action**.
2932
- Example: \`<action>close</action>\`
3033
- url: (optional) Use this for providing the URL for the \`launch\` action.
@@ -35,8 +38,8 @@ Parameters:
3538
* Example: <text>Hello, world!</text>
3639
- selector: (optional) The CSS selector for the file input element for the \`upload\` action.
3740
* Example: <selector>input[type="file"]</selector>
38-
- filepath: (optional) The path to the file to upload for the \`upload\` action.
39-
* Example: <filepath>/path/to/file.jpg</filepath>
41+
- filepath: (optional) The **absolute** path to the file to upload for the \`upload\` action.
42+
* Example: <filepath>/Users/username/path/to/file.jpg</filepath>
4043
Usage:
4144
<browser_action>
4245
<action>Action to perform (e.g., launch, click, type, scroll_down, scroll_up, upload, close)</action>

src/services/browser/BrowserSession.ts

Lines changed: 137 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -243,9 +243,14 @@ export class BrowserSession {
243243
try {
244244
await action(this.page)
245245
} catch (err) {
246-
if (!(err instanceof TimeoutError)) {
246+
// Re-throw the file chooser error to ensure it propagates to the caller
247+
if (err instanceof Error && err.message.includes("Clicking this element opened a file selection dialog")) {
248+
throw err // Re-throw to ensure the click action fails properly
249+
} else if (!(err instanceof TimeoutError)) {
247250
logs.push(`[Error] ${err.toString()}`)
248251
}
252+
// Re-throw the error to be caught by the caller (e.g., the test)
253+
throw err
249254
}
250255

251256
// Wait for console inactivity, with a timeout
@@ -343,10 +348,15 @@ export class BrowserSession {
343348
const [x, y] = coordinate.split(",").map(Number)
344349
return this.doAction(async (page) => {
345350
// Set up file chooser interception
351+
console.log("[TEST_DEBUG] Attaching filechooser listener")
346352
page.once("filechooser", async (fileChooser) => {
347-
// This will be handled by the upload method if needed
348-
console.log("File chooser detected, but no files provided")
353+
console.log("[TEST_DEBUG] Filechooser listener triggered!")
354+
// Intercept file chooser triggered by click and throw an error
355+
throw new Error(
356+
"Clicking this element opened a file selection dialog. You MUST use the 'upload' action with a valid CSS selector for the file input element and the file path instead of clicking.",
357+
)
349358
})
359+
console.log("[TEST_DEBUG] Filechooser listener attached")
350360

351361
// Set up network activity monitoring
352362
let hasNetworkActivity = false
@@ -358,7 +368,9 @@ export class BrowserSession {
358368
page.on("request", requestListener)
359369

360370
// Perform the click
371+
console.log(`[TEST_DEBUG] Performing click at ${x},${y}`)
361372
await page.mouse.click(x, y)
373+
console.log(`[TEST_DEBUG] Click performed at ${x},${y}`)
362374
this.currentMousePosition = coordinate
363375

364376
// Small delay before checking for navigation
@@ -415,16 +427,16 @@ export class BrowserSession {
415427
}
416428

417429
/**
418-
* Uploads a file to a file input element identified by the selector
419-
*
420-
* This method implements a robust file upload strategy with multiple fallback mechanisms:
421-
* 1. First tries to directly set the file on the input element (works with hidden inputs)
422-
* 2. Falls back to using the FileChooser API if direct method fails
423-
*
424-
* @param selector CSS selector for the file input element
425-
* @param filepath Path to the file to upload
426-
* @returns Browser action result with screenshot and logs
427-
*/
430+
* Uploads a file to a file input element identified by the selector
431+
*
432+
* This method implements a robust file upload strategy with multiple fallback mechanisms:
433+
* 1. First tries to directly set the file on the input element (works with hidden inputs)
434+
* 2. Falls back to using the FileChooser API if direct method fails
435+
*
436+
* @param selector CSS selector for the file input element
437+
* @param filepath Path to the file to upload
438+
* @returns Browser action result with screenshot and logs
439+
*/
428440
async upload(selector: string, filepath: string): Promise<BrowserActionResult> {
429441
return this.doAction(async (page) => {
430442
// Verify file exists before attempting upload
@@ -458,11 +470,11 @@ export class BrowserSession {
458470
}
459471

460472
/**
461-
* Verifies that the file exists before attempting to upload it
462-
*
463-
* @param filepath Path to the file to upload
464-
* @throws Error if the file does not exist
465-
*/
473+
* Verifies that the file exists before attempting to upload it
474+
*
475+
* @param filepath Path to the file to upload
476+
* @throws Error if the file does not exist
477+
*/
466478
private async verifyFileExists(filepath: string): Promise<void> {
467479
try {
468480
await fs.access(filepath)
@@ -472,12 +484,15 @@ export class BrowserSession {
472484
}
473485

474486
/**
475-
* Sets up network monitoring to track upload activity
476-
*
477-
* @param page The Puppeteer page
478-
* @returns Object containing request and response listeners
479-
*/
480-
private setupNetworkMonitoring(page: Page): { requestListener: (request: any) => void; responseListener: (response: any) => void } {
487+
* Sets up network monitoring to track upload activity
488+
*
489+
* @param page The Puppeteer page
490+
* @returns Object containing request and response listeners
491+
*/
492+
private setupNetworkMonitoring(page: Page): {
493+
requestListener: (request: any) => void
494+
responseListener: (response: any) => void
495+
} {
481496
const requestListener = (request: any) => {
482497
const url = request.url()
483498
if (request.method() === "POST" || request.method() === "PUT") {
@@ -499,14 +514,14 @@ export class BrowserSession {
499514
}
500515

501516
/**
502-
* Attempts to upload a file using multiple methods with fallbacks
503-
*
504-
* @param page The Puppeteer page
505-
* @param selector CSS selector for the file input element
506-
* @param filepath Path to the file to upload
507-
* @param initialUrl The URL before the upload started
508-
* @param initialContent The page content before the upload started
509-
*/
517+
* Attempts to upload a file using multiple methods with fallbacks
518+
*
519+
* @param page The Puppeteer page
520+
* @param selector CSS selector for the file input element
521+
* @param filepath Path to the file to upload
522+
* @param initialUrl The URL before the upload started
523+
* @param initialContent The page content before the upload started
524+
*/
510525
private async attemptUploadWithMultipleMethods(
511526
page: Page,
512527
selector: string,
@@ -534,15 +549,15 @@ export class BrowserSession {
534549
}
535550

536551
/**
537-
* Uploads a file using the direct input method
538-
* This method works by directly setting the file on the input element
539-
*
540-
* @param page The Puppeteer page
541-
* @param selector CSS selector for the file input element
542-
* @param filepath Path to the file to upload
543-
* @param initialUrl The URL before the upload started
544-
* @param initialContent The page content before the upload started
545-
*/
552+
* Uploads a file using the direct input method
553+
* This method works by directly setting the file on the input element
554+
*
555+
* @param page The Puppeteer page
556+
* @param selector CSS selector for the file input element
557+
* @param filepath Path to the file to upload
558+
* @param initialUrl The URL before the upload started
559+
* @param initialContent The page content before the upload started
560+
*/
546561
private async uploadWithDirectInputMethod(
547562
page: Page,
548563
selector: string,
@@ -581,14 +596,14 @@ export class BrowserSession {
581596
}
582597

583598
/**
584-
* Finds the target file input element based on the selector
585-
* Falls back to the first available file input if the selector doesn't match
586-
*
587-
* @param page The Puppeteer page
588-
* @param selector CSS selector for the file input element
589-
* @param fileInputs Array of file input elements found on the page
590-
* @returns The target file input element or null if not found
591-
*/
599+
* Finds the target file input element based on the selector
600+
* Falls back to the first available file input if the selector doesn't match
601+
*
602+
* @param page The Puppeteer page
603+
* @param selector CSS selector for the file input element
604+
* @param fileInputs Array of file input elements found on the page
605+
* @returns The target file input element or null if not found
606+
*/
592607
private async findTargetFileInput(
593608
page: Page,
594609
selector: string,
@@ -626,13 +641,13 @@ export class BrowserSession {
626641
}
627642

628643
/**
629-
* Makes a file input element accessible by modifying its styles
630-
* This is necessary for hidden file inputs that are not directly interactive
631-
*
632-
* @param page The Puppeteer page
633-
* @param inputElement The file input element
634-
* @returns The original styles to restore later
635-
*/
644+
* Makes a file input element accessible by modifying its styles
645+
* This is necessary for hidden file inputs that are not directly interactive
646+
*
647+
* @param page The Puppeteer page
648+
* @param inputElement The file input element
649+
* @returns The original styles to restore later
650+
*/
636651
private async makeFileInputAccessible(page: Page, inputElement: ElementHandle<Element>): Promise<any> {
637652
return page.evaluate((el) => {
638653
// Cast to HTMLInputElement to access style and other properties
@@ -658,12 +673,12 @@ export class BrowserSession {
658673
}
659674

660675
/**
661-
* Restores the original styles of a file input element
662-
*
663-
* @param page The Puppeteer page
664-
* @param inputElement The file input element
665-
* @param originalStyles The original styles to restore
666-
*/
676+
* Restores the original styles of a file input element
677+
*
678+
* @param page The Puppeteer page
679+
* @param inputElement The file input element
680+
* @param originalStyles The original styles to restore
681+
*/
667682
private async restoreFileInputStyles(
668683
page: Page,
669684
inputElement: ElementHandle<Element>,
@@ -683,15 +698,15 @@ export class BrowserSession {
683698
}
684699

685700
/**
686-
* Attempts to upload a file using the FileChooser API
687-
* This is a fallback method that works by clicking on the file input
688-
*
689-
* @param page The Puppeteer page
690-
* @param selector CSS selector for the file input element
691-
* @param filepath Path to the file to upload
692-
* @param initialUrl The URL before the upload started
693-
* @param initialContent The page content before the upload started
694-
*/
701+
* Attempts to upload a file using the FileChooser API
702+
* This is a fallback method that works by clicking on the file input
703+
*
704+
* @param page The Puppeteer page
705+
* @param selector CSS selector for the file input element
706+
* @param filepath Path to the file to upload
707+
* @param initialUrl The URL before the upload started
708+
* @param initialContent The page content before the upload started
709+
*/
695710
private async uploadWithFileChooserAPI(
696711
page: Page,
697712
selector: string,
@@ -722,18 +737,18 @@ export class BrowserSession {
722737
}
723738

724739
/**
725-
* Waits for an upload to complete by monitoring network activity, URL changes, and UI changes
726-
*
727-
* This method uses multiple strategies to detect when an upload has completed:
728-
* 1. Monitors network activity and waits for it to settle
729-
* 2. Checks for URL changes that might indicate navigation after upload
730-
* 3. Checks for content changes that might indicate successful upload
731-
* 4. Looks for common success indicators in the page content
732-
*
733-
* @param page The Puppeteer page
734-
* @param initialUrl The URL before the upload started
735-
* @param initialContent The page content before the upload started
736-
*/
740+
* Waits for an upload to complete by monitoring network activity, URL changes, and UI changes
741+
*
742+
* This method uses multiple strategies to detect when an upload has completed:
743+
* 1. Monitors network activity and waits for it to settle
744+
* 2. Checks for URL changes that might indicate navigation after upload
745+
* 3. Checks for content changes that might indicate successful upload
746+
* 4. Looks for common success indicators in the page content
747+
*
748+
* @param page The Puppeteer page
749+
* @param initialUrl The URL before the upload started
750+
* @param initialContent The page content before the upload started
751+
*/
737752
private async waitForUploadToComplete(page: Page, initialUrl: string, initialContent: string): Promise<void> {
738753
console.log("[UPLOAD] Waiting for upload to complete...")
739754

@@ -757,10 +772,10 @@ export class BrowserSession {
757772
}
758773

759774
/**
760-
* Waits for network activity to settle (no new activity for 2 seconds)
761-
*
762-
* @param page The Puppeteer page
763-
*/
775+
* Waits for network activity to settle (no new activity for 2 seconds)
776+
*
777+
* @param page The Puppeteer page
778+
*/
764779
private async waitForNetworkActivityToSettle(page: Page): Promise<void> {
765780
// Track network activity
766781
let lastNetworkActivityTime = Date.now()
@@ -808,12 +823,12 @@ export class BrowserSession {
808823
}
809824

810825
/**
811-
* Detects changes that might indicate a successful upload
812-
*
813-
* @param page The Puppeteer page
814-
* @param initialUrl The URL before the upload started
815-
* @param initialContent The page content before the upload started
816-
*/
826+
* Detects changes that might indicate a successful upload
827+
*
828+
* @param page The Puppeteer page
829+
* @param initialUrl The URL before the upload started
830+
* @param initialContent The page content before the upload started
831+
*/
817832
private async detectUploadCompletionChanges(page: Page, initialUrl: string, initialContent: string): Promise<void> {
818833
// Check for URL changes that might indicate successful upload
819834
const currentUrl = page.url()
@@ -836,16 +851,16 @@ export class BrowserSession {
836851
}
837852
}
838853
/**
839-
* Checks for common indicators that an upload was successful
840-
*
841-
* This method looks for:
842-
* 1. Success text messages in the page content
843-
* 2. Success elements/icons that might indicate completion
844-
* 3. New file elements that might have appeared after upload
845-
*
846-
* @param page The Puppeteer page
847-
* @returns Array of success indicators found
848-
*/
854+
* Checks for common indicators that an upload was successful
855+
*
856+
* This method looks for:
857+
* 1. Success text messages in the page content
858+
* 2. Success elements/icons that might indicate completion
859+
* 3. New file elements that might have appeared after upload
860+
*
861+
* @param page The Puppeteer page
862+
* @returns Array of success indicators found
863+
*/
849864
private async checkForUploadSuccessIndicators(page: Page): Promise<string[]> {
850865
const indicators: string[] = []
851866

@@ -924,4 +939,23 @@ export class BrowserSession {
924939

925940
return indicators
926941
}
942+
943+
// Helper method for testing to get element coordinates
944+
async getElementCoordinatesForTest(selector: string): Promise<{ x: number; y: number } | null> {
945+
if (!this.page) {
946+
throw new Error("Browser page not available")
947+
}
948+
const element = await this.page.$(selector)
949+
if (!element) {
950+
return null
951+
}
952+
const boundingBox = await element.boundingBox()
953+
if (!boundingBox) {
954+
return null
955+
}
956+
return {
957+
x: Math.round(boundingBox.x + boundingBox.width / 2),
958+
y: Math.round(boundingBox.y + boundingBox.height / 2),
959+
}
960+
}
927961
}

0 commit comments

Comments
 (0)