Skip to content

Commit 5b7adcd

Browse files
authored
Merge branch 'main' into hiro/kiosk_mode
2 parents fc36507 + 7154d5f commit 5b7adcd

File tree

5 files changed

+74
-4
lines changed

5 files changed

+74
-4
lines changed

browsers/extensions.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ kernel extensions upload ./downloaded-extension --name my-extension
8888

8989
## Loading an extension into a running browser
9090

91-
If you have a browser running and would like to load an extension into it after the browser session has started, Kernel also allows you to do that via the CLI (or [API](http://localhost:3000/api-reference/browsers/ad-hoc-upload-one-or-more-unpacked-extensions-to-a-running-browser-instance)):
91+
If you have a browser running and would like to load an extension into it after the browser session has started, Kernel also allows you to do that via the CLI (or [API](/api-reference/browsers/ad-hoc-upload-one-or-more-unpacked-extensions-to-a-running-browser-instance)):
9292

9393
```bash CLI
9494
kernel browsers extensions upload <session_id> ./my-extension

docs.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,11 +99,13 @@
9999
{
100100
"group": "Integrations",
101101
"pages": [
102+
"integrations/overview",
102103
"integrations/browser-use",
103104
{
104105
"group": "Computer Use",
105106
"pages": [
106107
"integrations/computer-use/anthropic",
108+
"integrations/computer-use/gemini",
107109
"integrations/computer-use/openai"
108110
]
109111
},
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
title: "Gemini"
3+
---
4+
5+
Google's [Gemini 2.5 Computer Use model](https://blog.google/technology/google-deepmind/gemini-computer-use-model/) is a specialized model built on Gemini 2.5 Pro's capabilities to power agents that can interact with user interfaces.
6+
7+
By integrating Gemini 2.5 Computer Use with Kernel, you can run these AI-powered browser automations on cloud-hosted infrastructure, eliminating the need for local browser management and enabling scalable, reliable AI agents.
8+
9+
## Quick setup with our example template
10+
11+
Get started quickly with our TypeScript template that demonstrates Gemini 2.5 Computer Use with Kernel.
12+
13+
Check out the [Open-source Gemini Template](https://github.com/onkernel/ts-stagehand-google-cua-agent) repository for a complete working example that shows how to:
14+
- Set up Gemini 2.5 Computer Use with Kernel
15+
- Use Stagehand for browser automation
16+
- Run AI-powered web interactions on cloud infrastructure
17+
18+
## Benefits of using Kernel with Gemini Computer Use
19+
20+
- **No local browser management**: Run Computer Use automations without installing or maintaining browsers locally
21+
- **Scalability**: Launch multiple browser sessions in parallel for concurrent automations
22+
- **Stealth mode**: Built-in anti-detection features for web interactions
23+
- **Session persistence**: Maintain browser state across automation runs
24+
- **Live view**: Debug your automations with real-time browser viewing
25+
26+
## Next steps
27+
28+
- Check out [live view](/browsers/live-view) for debugging your automations
29+
- Learn about [stealth mode](/browsers/stealth) for avoiding detection
30+
- Learn how to properly [terminate browser sessions](/browsers/termination)
31+
- Learn how to [deploy](/apps/deploy) your Computer Use app to Kernel

integrations/magnitude.mdx

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,12 @@ const client = new Kernel({
2727
apiKey: process.env.KERNEL_API_KEY,
2828
});
2929

30-
const kernelBrowser = await client.browsers.create();
30+
const kernelBrowser = await client.browsers.create({
31+
viewport: {
32+
width: 1920,
33+
height: 1080
34+
}
35+
});
3136

3237
console.log(`Live view url: ${kernelBrowser.browser_live_view_url}`);
3338
```
@@ -43,10 +48,12 @@ const agent = await startBrowserAgent({
4348
browser: {
4449
cdp: kernelBrowser.cdp_ws_url,
4550
contextOptions: {
46-
viewport: { width: 1024, height: 768 }
51+
viewport: {
52+
width: 1920,
53+
height: 1080
54+
}
4755
}
4856
},
49-
virtualScreenDimensions: { width: 1024, height: 768 },
5057
llm: {
5158
provider: 'anthropic',
5259
options: {

integrations/overview.mdx

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
title: "Overview"
3+
---
4+
5+
Kernel's browsers are compatible with all browser and Computer Use frameworks.
6+
7+
## Universal compatibility
8+
9+
Kernel browsers work with any framework or tool that supports the Chrome DevTools Protocol (CDP). This means you can:
10+
11+
- **Use any agent framework**: Integrate with popular frameworks like Browser Use, Stagehand, Playwright, Puppeteer, Selenium, and more
12+
- **Connect via CDP**: All browsers expose a CDP WebSocket URL for direct connection
13+
- **No vendor lock-in**: Switch between frameworks or use multiple frameworks simultaneously
14+
- **Standard protocols**: Built on open standards that work with the entire browser automation ecosystem
15+
16+
## Popular Framework Integrations
17+
18+
Kernel provides detailed guides for popular agent frameworks:
19+
20+
- **[Browser Use](/integrations/browser-use)** - AI browser agent framework
21+
- **[Stagehand](/integrations/stagehand)** - AI browser automation with natural language
22+
- **[Computer Use (Anthropic)](/integrations/computer-use/anthropic)** - Claude's computer use capability
23+
- **[Computer Use (OpenAI)](/integrations/computer-use/openai)** - OpenAI's computer use capability
24+
- **[Magnitude](/integrations/magnitude)** - Vision-focused browser automation framework
25+
- **[Val Town](/integrations/valtown)** - Serverless function runtime
26+
- **[Vercel](https://github.com/onkernel/vercel-template)** - Deploy browser automations to Vercel
27+
28+
## Custom Integrations
29+
30+
Kernel works with any tool that supports CDP. Check out our [browser creation guide](/browsers/create-a-browser) to learn how to connect any other agent framework.

0 commit comments

Comments
 (0)