This document details the implementation of the AI agent vision system in Planeo, focusing on how AI agents perceive and display their environment.
AI agents in Planeo have their own virtual cameras within the 3D scene. These cameras are used to capture images of their surroundings, providing them with visual input for decision-making and displaying their perspective to the user.
-
Capture Mechanism: The
useAIAgentControllerhook (src/hooks/useAIAgentController.ts) is responsible for managing AI agents' vision.- Each AI agent has a dedicated
PerspectiveCameraand aWebGLRenderTarget. - The visual representation (what the AI "sees") is updated frequently, controlled by
VISUAL_UPDATE_INTERVAL_MS(currently 100 milliseconds, aiming for ~10 FPS). This involves rendering the scene from the AI's perspective and updating the displayed image. - The AI's decision-making process, which includes calling an LLM, happens less frequently, controlled by
DECISION_MAKING_INTERVAL_MS(currently 7000 milliseconds). This ensures that visual updates are fast and fluid, while LLM calls are made at a more controlled rate. - The rendered image for both visual updates and decision-making is converted to a data URL (PNG format).
- Each AI agent has a dedicated
-
State Management: The generated image data URL for each AI agent (from the frequent visual updates) is stored in a Zustand store (
useAIVisionStoreinsrc/stores/aiVisionStore.ts) using thesetAIAgentViewaction. -
Display Component:
- The
AIAgentViewscomponent (src/app/components/AIAgentViews.tsx) subscribes to theuseAIVisionStore. - When the image data URL for an agent updates in the store, this component re-renders, displaying the new image in the top-left and top-right corners of the screen.
- The images are displayed at a resolution of 160x100 pixels, scaled down from the capture resolution of 320x200 pixels.
- The
The VISUAL_UPDATE_INTERVAL_MS in useAIAgentController.ts dictates the frequency of the displayed view updates, providing a near real-time feed. The DECISION_MAKING_INTERVAL_MS controls how often the AI processes this visual information (along with chat history) to make decisions and perform actions. This separation ensures responsive visuals without overloading the AI decision-making services.
This ensures that the views displayed are an accurate representation of what each AI agent's virtual camera is capturing from the scene, updated frequently.
- Performance: Very frequent updates (e.g., 30-60 FPS) could impact performance, especially with multiple AI agents. The current interval is a balance between real-time feel and resource usage.