Skip to content

Conversation

@dangusev
Copy link
Contributor

@dangusev dangusev commented Nov 7, 2025

Summary by CodeRabbit

  • New Features

    • Added Baseten vision-language plugin: video buffering, frame encoding, streaming responses, and exported VLM.
    • LLMs can receive conversation history at runtime via a public setter.
    • Introduced an LLM error event type.
  • Documentation

    • Added Baseten plugin README and example README.
    • Updated example environment file with Baseten API entries.
  • Chores

    • Added packaging/build config and workspace registration for the Baseten plugin.
  • Tests

    • Updated tests to use the new conversation setter.

Note

Add Baseten VLM plugin with video frame buffering and streaming, introduce LLM.set_conversation wired from Agent.join, and update tests/env/workspace accordingly.

  • Plugins — Baseten VLM (plugins/baseten):
    • New BasetenVLM that buffers video frames, converts to JPEG, and streams responses via OpenAI-compatible client; emits LLMResponseChunkEvent, LLMResponseCompletedEvent, and plugin LLMErrorEvent.
    • Adds README, example app (example/qwen_vl_example.py), packaging (pyproject.toml, py.typed), and export (__init__.py).
  • Core LLM/Agent:
    • Add LLM.set_conversation(conversation) and call it from Agent.join after conversation creation, enabling chat history access.
    • Track instructions/parsed_instructions in LLM; import Instructions type.
  • Tests:
    • Update Anthropic, AWS, Gemini, and OpenRouter tests to use set_conversation instead of direct _conversation assignment.
  • Config/Workspace:
    • Add BASETEN_API_KEY and BASETEN_BASE_URL to .env.example.
    • Register Baseten plugin in workspace (pyproject.toml, uv.lock).

Written by Cursor Bugbot for commit 2851eeb. This will update automatically on new commits. Configure here.

@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a Baseten VLM plugin (implementation, events, docs, examples, and packaging), extends core LLM with instruction fields and a new set_conversation API plus Agent wiring, updates workspace and tests to use the public setter, and adds two Baseten env vars to .env.example.

Changes

Cohort / File(s) Summary
Environment
\.env.example``
Added BASETEN_API_KEY and BASETEN_BASE_URL.
Core LLM refactor
agents-core/vision_agents/core/llm/llm.py
Added instructions: Optional[str], parsed_instructions: Optional[Instructions], _conversation: Optional[Conversation]; imported Instructions; added set_conversation(self, conversation); removed caching agent.conversation in _attach_agent.
Agent wiring
agents-core/vision_agents/core/agents/agents.py
Calls self.llm.set_conversation(self.conversation) after creating/joining conversation to wire conversation into the LLM.
Baseten plugin packaging & workspace
plugins/baseten/pyproject.toml, pyproject.toml
New plugin pyproject (Hatch) and added plugins/baseten to workspace and UV source vision-agents-plugins-baseten.
Baseten package export
plugins/baseten/vision_agents/plugins/baseten/__init__.py
Exports BasetenVLM as public alias VLM and updates __all__.
Baseten VLM implementation
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py
New BasetenVLM (subclass of VideoLLM) with constructor, frame buffering/forwarder, _get_frames_bytes, _frame_to_jpeg_bytes, watch_video_track, and simple_response streaming that emits LLMResponseChunkEvent and LLMResponseCompletedEvent; includes error handling and observability.
Baseten plugin events
plugins/baseten/vision_agents/plugins/baseten/events.py
Added LLMErrorEvent dataclass (type="plugin.llm.error", error_message, event_data).
Docs & examples
plugins/baseten/README.md, plugins/baseten/example/README.md, plugins/baseten/example/qwen_vl_example.py
Added plugin README, example README, and example script wiring a Baseten VLM agent with create_agent and join_call entry points.
Example packaging
plugins/baseten/example/pyproject.toml
New example project pyproject listing dependencies and workspace sources.
Tests: use public setter
plugins/anthropic/tests/test_anthropic_llm.py, plugins/aws/tests/test_aws.py, plugins/gemini/tests/test_gemini_llm.py, plugins/openrouter/tests/test_openrouter_llm.py
Replaced direct _conversation assignments with set_conversation(...) calls; minor formatting tweaks.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant User
    participant Agent
    participant LLM
    participant VideoTrack
    participant BasetenAPI
    participant EventBus

    User->>Agent: join()
    Agent->>LLM: set_conversation(conversation)
    User->>LLM: simple_response(text)
    LLM->>VideoTrack: collect buffered frames
    LLM->>BasetenAPI: stream request (text + frames + conv)
    loop streaming
        BasetenAPI-->>LLM: delta chunk
        LLM->>EventBus: emit LLMResponseChunkEvent
    end
    BasetenAPI-->>LLM: stream complete
    LLM->>EventBus: emit LLMResponseCompletedEvent

    alt error
        BasetenAPI-->>LLM: error
        LLM->>EventBus: emit LLMErrorEvent
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus review on:
    • plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py — async streaming, frame capture/encoding, event emission, error handling.
    • agents-core/vision_agents/core/llm/llm.py and agents-core/vision_agents/core/agents/agents.py — instruction parsing, conversation wiring, lifecycle interactions.
    • Packaging/workspace entries (pyproject.toml, plugins/baseten/pyproject.toml) for correctness.

Possibly related PRs

Suggested labels

examples

Suggested reviewers

  • yarikdevcom
  • maxkahan

Poem

I press the film to silence; it answers thin,
a shuttered heart that keeps the room alive.
Conversation blooms in cold black grain within,
the machine translates the dark into a voice —
a slow, exacting light that will not lie.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.61% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Baseten integration' is concise and directly reflects the primary objective of the PR, which adds a new Baseten plugin to the codebase with LLM/Agent core enhancements.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/baseten

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 290849e and 16c53c8.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • .env.example (1 hunks)
  • agents-core/vision_agents/core/llm/llm.py (2 hunks)
  • plugins/baseten/README.md (1 hunks)
  • plugins/baseten/pyproject.toml (1 hunks)
  • plugins/baseten/vision_agents/plugins/baseten/__init__.py (1 hunks)
  • plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1 hunks)
  • plugins/baseten/vision_agents/plugins/baseten/events.py (1 hunks)
  • pyproject.toml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

  • plugins/baseten/vision_agents/plugins/baseten/__init__.py
  • plugins/baseten/vision_agents/plugins/baseten/events.py
  • agents-core/vision_agents/core/llm/llm.py
  • plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py
🧬 Code graph analysis (4)
plugins/baseten/vision_agents/plugins/baseten/__init__.py (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
  • BasetenVLM (32-274)
plugins/baseten/vision_agents/plugins/baseten/events.py (1)
agents-core/vision_agents/core/events/base.py (1)
  • PluginBaseEvent (52-54)
agents-core/vision_agents/core/llm/llm.py (1)
agents-core/vision_agents/core/agents/conversation.py (1)
  • Conversation (67-227)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/llm.py (3)
  • LLMResponseEvent (38-42)
  • VideoLLM (443-464)
  • _conversation (83-86)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/utils/video_forwarder.py (2)
  • VideoForwarder (14-195)
  • start_event_consumer (109-195)
agents-core/vision_agents/core/processors/base_processor.py (1)
  • Processor (35-43)
plugins/baseten/vision_agents/plugins/baseten/events.py (1)
  • LLMErrorEvent (7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"

Comment on lines 91 to 161
async def simple_response(
self,
text: str,
processors: Optional[list[Processor]] = None,
participant: Optional[Participant] = None,
) -> LLMResponseEvent:
"""
simple_response is a standardized way to create an LLM response.
This method is also called every time the new STT transcript is received.
Args:
text: The text to respond to.
processors: list of processors (which contain state) about the video/voice AI.
participant: the Participant object, optional.
Examples:
llm.simple_response("say hi to the user, be nice")
"""

# TODO: Clean up the `_build_enhanced_instructions` and use that. The should be compiled at the agent probably.

if self._conversation is None:
# The agent hasn't joined the call yet.
logger.warning(
"Cannot create an LLM response - the conversation has not been initialized yet."
)
return LLMResponseEvent(original=None, text="")

messages = []

# Add Agent's instructions as system prompt.
if self.instructions:
messages.append(
{
"role": "system",
"content": self.instructions,
}
)

# TODO: Do we need to limit how many messages we send?
# Add all messages from the conversation to the prompt
for message in self._conversation.messages:
messages.append(
{
"role": message.role,
"content": message.content,
}
)

# Attach the latest bufferred frames to the request
frames_data = []
for frame_bytes in self._get_frames_bytes():
frame_b64 = base64.b64encode(frame_bytes).decode("utf-8")
frame_msg = {
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"},
}
frames_data.append(frame_msg)

logger.debug(
f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"'
)

messages.append(
{
"role": "user",
"content": frames_data,
}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use the user text when building the chat request

text is never added to the messages payload, so the Baseten model only receives stale conversation history + images. That’s a hard correctness bug: it cannot respond to the new prompt. Please include the transcript (and avoid sending an empty content list) before dispatching the request, e.g.:

-        frames_data = []
+        frames_data: list[dict[str, object]] = []
         for frame_bytes in self._get_frames_bytes():
             frame_b64 = base64.b64encode(frame_bytes).decode("utf-8")
             frame_msg = {
                 "type": "image_url",
                 "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"},
             }
             frames_data.append(frame_msg)
 
+        if text:
+            frames_data.insert(0, {"type": "text", "text": text})
+
+        if not frames_data:
+            logger.warning(
+                "Cannot create an LLM response - no prompt text or frames available."
+            )
+            return LLMResponseEvent(original=None, text="")
+
         logger.debug(
             f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"'
         )
 
-        messages.append(
-            {
-                "role": "user",
-                "content": frames_data,
-            }
-        )
+        messages.append({"role": "user", "content": frames_data})

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)

89-158: Critical: User text is never sent to the model.

The text parameter containing the new user prompt is never added to the messages payload. The model only receives conversation history and frames, but cannot respond to the new input. This is a correctness bug that breaks the core functionality.

Apply this diff to include the user text:

-        frames_data = []
+        frames_data: list[dict[str, object]] = []
         for frame_bytes in self._get_frames_bytes():
             frame_b64 = base64.b64encode(frame_bytes).decode("utf-8")
             frame_msg = {
                 "type": "image_url",
                 "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"},
             }
             frames_data.append(frame_msg)
 
+        if text:
+            frames_data.insert(0, {"type": "text", "text": text})
+
+        if not frames_data:
+            logger.warning(
+                "Cannot create an LLM response - no prompt text or frames available."
+            )
+            return LLMResponseEvent(original=None, text="")
+
         logger.debug(
             f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"'
         )
 
         messages.append(
             {
                 "role": "user",
                 "content": frames_data,
             }
         )
🧹 Nitpick comments (5)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)

86-87: Consider making frame dimensions configurable.

The frame dimensions are hardcoded to 800x600. Different models or use cases might benefit from different resolutions. Consider adding frame_width and frame_height as constructor parameters.

Apply this diff to add configurable dimensions:

     def __init__(
         self,
         model: str,
         api_key: Optional[str] = None,
         base_url: Optional[str] = None,
         fps: int = 1,
         frame_buffer_seconds: int = 10,
+        frame_width: int = 800,
+        frame_height: int = 600,
         client: Optional[AsyncOpenAI] = None,
     ):

Then update the initialization:

-        self._frame_width = 800
-        self._frame_height = 600
+        self._frame_width = frame_width
+        self._frame_height = frame_height

92-93: Unused parameter: processors.

The processors parameter is declared but never used in the method. Either utilize it or remove it from the signature.


110-110: Address or remove TODO comment.

The TODO comment references _build_enhanced_instructions, but this method is not present or used. Clarify the intended implementation or remove the comment.


129-129: Consider limiting conversation history size.

The TODO comment raises a valid concern about message volume. Sending unbounded conversation history could lead to token limit errors or increased latency. Consider implementing a sliding window or token-based truncation strategy.


276-308: Well-implemented frame conversion utility.

The function correctly handles aspect ratio preservation and uses appropriate resampling quality (LANCZOS). The TODO comment about moving to core utils is valid—this utility could benefit other plugins.

Would you like me to open an issue to track moving this utility to a shared location?

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a9361a8 and 93cf015.

📒 Files selected for processing (1)
  • plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

  • plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py
🧬 Code graph analysis (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLMResponseEvent (38-42)
  • VideoLLM (450-471)
agents-core/vision_agents/core/processors/base_processor.py (1)
  • Processor (35-43)
agents-core/vision_agents/core/utils/video_forwarder.py (2)
  • VideoForwarder (14-195)
  • start_event_consumer (109-195)
plugins/baseten/vision_agents/plugins/baseten/events.py (1)
  • LLMErrorEvent (7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
🔇 Additional comments (2)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (2)

1-28: LGTM!

Imports are well-organized and all appear necessary for the implementation. No sys.path modifications present, adhering to coding guidelines.


263-273: LGTM!

The method correctly iterates over buffered frames and converts them to JPEG bytes. Implementation is clean and well-documented.

Comment on lines +249 to +256
if not shared_forwarder:
self._video_forwarder = shared_forwarder or VideoForwarder(
cast(VideoStreamTrack, track),
max_buffer=10,
fps=1.0, # Low FPS for VLM
name="baseten_vlm_forwarder",
)
await self._video_forwarder.start()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix redundant condition and assignment logic.

The condition if not shared_forwarder: is followed by shared_forwarder or VideoForwarder(...), which is redundant. Additionally, if shared_forwarder is provided, calling await self._video_forwarder.start() may attempt to start an already-running forwarder.

Apply this diff to fix the logic:

         logger.info("🎥 BasetenVLM subscribing to VideoForwarder")
-        if not shared_forwarder:
-            self._video_forwarder = shared_forwarder or VideoForwarder(
+        if shared_forwarder is None:
+            self._video_forwarder = VideoForwarder(
                 cast(VideoStreamTrack, track),
                 max_buffer=10,
                 fps=1.0,  # Low FPS for VLM
                 name="baseten_vlm_forwarder",
             )
             await self._video_forwarder.start()
         else:
             self._video_forwarder = shared_forwarder
🤖 Prompt for AI Agents
In plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py around lines 249
to 256, the check `if not shared_forwarder:` is redundant with using
`shared_forwarder or VideoForwarder(...)` and may call start on an
already-running forwarder; change the logic to assign self._video_forwarder to
shared_forwarder when provided, otherwise create a new VideoForwarder instance,
and call await self._video_forwarder.start() only when you created the new
forwarder (i.e., inside the branch that constructs it).

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

"role": "user",
"content": frames_data,
}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: User's Voice Silenced in LLM Request

The text parameter containing the user's question is never included in the API request. Only video frames are sent in the final user message, causing the LLM to receive frames without the actual question or prompt. The user's text input needs to be added to the message content alongside the frames.

Fix in Cursor Fix in Web

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (3)

31-39: Complete the TODO in the class docstring.

The docstring still contains a TODO placeholder. Please add a brief description of the class purpose (e.g., "A video language model backed by Baseten-hosted models that processes video frames alongside text prompts"), document key parameters, and provide usage guidance.


88-157: CRITICAL: User prompt is never sent to the model.

The text parameter (Line 90) is never added to the messages payload. Only video frames are included in the final user message (Lines 152-157). This is a correctness bug that breaks the core functionality—the model cannot respond to the user's actual question.

Apply this diff to fix:

         # Attach the latest bufferred frames to the request
-        frames_data = []
+        frames_data: list[dict[str, object]] = []
         for frame_bytes in self._get_frames_bytes():
             frame_b64 = base64.b64encode(frame_bytes).decode("utf-8")
             frame_msg = {
                 "type": "image_url",
                 "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"},
             }
             frames_data.append(frame_msg)
 
+        if text:
+            frames_data.insert(0, {"type": "text", "text": text})
+
+        if not frames_data:
+            logger.warning(
+                "Cannot create an LLM response - no prompt text or frames available."
+            )
+            return LLMResponseEvent(original=None, text="")
+
         logger.debug(
             f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"'
         )
 
         messages.append(
             {
                 "role": "user",
                 "content": frames_data,
             }
         )

247-257: Fix redundant condition and avoid starting an already-running forwarder.

The condition if not shared_forwarder: followed by shared_forwarder or VideoForwarder(...) contains dead code—the shared_forwarder or part can never be reached. Additionally, calling await self._video_forwarder.start() when shared_forwarder is provided may attempt to start an already-running forwarder.

Apply this diff:

         logger.info("🎥 BasetenVLM subscribing to VideoForwarder")
-        if not shared_forwarder:
-            self._video_forwarder = shared_forwarder or VideoForwarder(
+        if shared_forwarder is None:
+            self._video_forwarder = VideoForwarder(
                 cast(VideoStreamTrack, track),
                 max_buffer=10,
                 fps=1.0,  # Low FPS for VLM
                 name="baseten_vlm_forwarder",
             )
             await self._video_forwarder.start()
         else:
             self._video_forwarder = shared_forwarder
🧹 Nitpick comments (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)

70-73: Enhance credential error messages.

The error messages for missing credentials could be more helpful by mentioning the environment variable names.

Apply this diff:

         elif not api_key:
-            raise ValueError("api_key must be provided")
+            raise ValueError("api_key must be provided or set via BASETEN_API_KEY environment variable")
         elif not base_url:
-            raise ValueError("base_url must be provided")
+            raise ValueError("base_url must be provided or set via BASETEN_BASE_URL environment variable")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 029852f and 2851eeb.

📒 Files selected for processing (4)
  • plugins/baseten/example/README.md (1 hunks)
  • plugins/baseten/example/pyproject.toml (1 hunks)
  • plugins/baseten/example/qwen_vl_example.py (1 hunks)
  • plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • plugins/baseten/example/README.md
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/baseten/example/qwen_vl_example.py (4)
agents-core/vision_agents/core/agents/agents.py (3)
  • Agent (125-1357)
  • create_call (698-703)
  • finish (558-591)
agents-core/vision_agents/core/edge/types.py (1)
  • User (15-18)
agents-core/vision_agents/core/agents/agent_launcher.py (1)
  • AgentLauncher (18-117)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
  • simple_response (88-223)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLMResponseEvent (38-42)
  • VideoLLM (450-471)
agents-core/vision_agents/core/processors/base_processor.py (1)
  • Processor (35-43)
agents-core/vision_agents/core/utils/video_forwarder.py (2)
  • VideoForwarder (14-195)
  • start_event_consumer (109-195)
plugins/baseten/vision_agents/plugins/baseten/events.py (1)
  • LLMErrorEvent (7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
🔇 Additional comments (10)
plugins/baseten/example/pyproject.toml (2)

1-13: LGTM!

Project metadata and dependencies align correctly with the example script's imports.


15-20: LGTM!

Workspace source configuration correctly references all plugin dependencies.

plugins/baseten/example/qwen_vl_example.py (4)

1-11: LGTM!

Imports and environment loading are correct.


14-28: LGTM!

Agent construction correctly wires Baseten VLM with edge, STT, TTS, and appropriate instructions.


30-43: Verify that the critical bug in baseten_vlm.py was addressed.

The join flow is correctly structured. However, Line 38 calls agent.simple_response("Describe what you currently see"), which relies on BasetenVLM's simple_response method. Past review comments identified a critical bug where the text parameter is never added to the messages payload (lines 88-157 in baseten_vlm.py), meaning the prompt won't reach the model. Please ensure this bug was fixed before merging.


45-46: LGTM!

CLI entry point correctly wires the AgentLauncher.

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (4)

1-28: LGTM!

Imports and constants are correctly defined.


159-223: LGTM!

Streaming response logic correctly handles API calls, error reporting via LLMErrorEvent, and emits appropriate chunk and completion events.


262-272: LGTM!

The frame iterator correctly processes buffered frames.


275-307: LGTM!

Frame-to-JPEG conversion correctly maintains aspect ratio and uses appropriate resampling. The TODO comment about moving to core utils is a valid future refactoring consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants