Run OpenAGI's Lux computer-use models against serverless browsers powered by Kernel.
📹 Watch the demo video - Shows the Lux agent navigating to agiopen.org using a Kernel serverless browser.
OpenAGI is an AI research organization building foundation models for computer use. Their Lux model is a vision-language model specifically designed to control computers by:
- Analyzing screenshots to understand the current UI state
- Deciding on the next action (click, type, scroll, etc.)
- Executing actions in a screenshot-action loop until the task is complete
Kernel provides Browsers-as-a-Service for AI agents and browser automation. Key features:
- Serverless Browsers: Instantly launch browsers without managing infrastructure
- Computer Controls API: Native OS-level mouse, keyboard, and screenshot controls
- Stealth Mode: Built-in anti-detection for reliable web automation
- Video Replays: Record browser sessions as MP4 videos
- Scalability: Run hundreds of concurrent browser sessions
Choose the right model for your task:
| Model | Best For | Avoid When |
|---|---|---|
lux-actor-1 |
Fast execution (1s/step), simple linear tasks (10-20 steps) | Complex reasoning, comparison, or handling ambiguous instructions |
lux-thinker-1 |
Complex planning, comparison tasks ("find cheapest"), handling ambiguity | Low latency needs or simple click-paths |
| Approach | Best For | Trade-offs |
|---|---|---|
| AsyncDefaultAgent (Actor) | Fast UI control, short tasks, real-time automation | Requires clear, step-by-step instructions |
| AsyncDefaultAgent (Thinker) | Research tasks, ambiguous goals, complex planning | Slower execution, best for async workflows |
| TaskerAgent (Actor) | Multi-step forms, structured workflows, repeatable processes | Requires defining todos upfront |
This integration connects OpenAGI's Lux model to Kernel's serverless browsers using custom providers:
KernelScreenshotProvider: Captures screenshots using Kernel's Computer Controls APIKernelActionHandler: Translates Lux actions (click, type, scroll) to Kernel commandsKernelBrowserSession: Manages browser lifecycle with automatic video recording
Use AsyncDefaultAgent for high-level, tasks with either the lux-actor-1 model for immediate tasks or lux-thinker-1 for more complex, longer running tasks:
async with KernelBrowserSession(
record_replay=True,
replay_output_path=replay_output,
) as session:
# Create the screenshot provider and action handler
provider = KernelScreenshotProvider(session)
handler = KernelActionHandler(session)
agent = AsyncDefaultAgent(
api_key=os.getenv("OAGI_API_KEY"),
max_steps=20,
model="lux-actor-1" # or "lux-thinker-1" for complex planning tasks
)
# Execute the task
print(f"\nExecuting task: {instruction}\n")
success = await agent.execute(
instruction=instruction,
action_handler=handler,
image_provider=provider,
)Use TaskerAgent for structured workflows with predefined steps:
async with KernelBrowserSession(
record_replay=True,
replay_output_path=replay_output,
) as session:
provider = KernelScreenshotProvider(session)
handler = KernelActionHandler(session)
tasker_agent = TaskerAgent(
api_key=os.getenv("OAGI_API_KEY"),
base_url=os.getenv("OAGI_BASE_URL", "https://api.agiopen.org"),
)
tasker_agent.set_task(
task="Navigate to the 'What is Computer Use' section of the OAGI homepage.",
todos=[
"Go to https://agiopen.org.",
"Make sure to press enter to start the navigation.",
"Click on the 'What is Computer Use?' button."
]
)
result = await tasker_agent.execute(
instruction="",
action_handler=handler,
image_provider=provider
)uv pip install kernel oagi python-dotenv PillowCreate a .env file with your API keys:
KERNEL_API_KEY=your_kernel_api_key
OAGI_API_KEY=your_openagi_api_key
Get your API keys:
- Kernel: dashboard.onkernel.com
- OpenAGI: developer.agiopen.org
For AsyncDefaultAgent (supports lux-actor-1 or lux-thinker-1):
python agent_default.pyFor TaskerAgent (structured workflows with todos):
python agent_tasker.pyThe agent will:
- Launch a serverless browser via Kernel
- Start recording a video replay
- Execute the task using Lux's vision-action loop
- Save the replay as
video_agent_replay.mp4 - Clean up the browser session
├── agent_default.py # AsyncDefaultAgent example (lux-actor-1/lux-thinker-1)
├── agent_tasker.py # TaskerAgent example with todos (lux-actor-1)
├── kernel_session.py # Browser lifecycle & replay management
├── kernel_provider.py # Screenshot provider using Kernel API
├── kernel_handler.py # Action handler with key translation
├── pyproject.toml # Project dependencies
└── video_agent_replay.mp4 # Recorded demo video
MIT