-
Notifications
You must be signed in to change notification settings - Fork 75
Baseten integration #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Baseten integration #156
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a Baseten VLM plugin (implementation, events, docs, examples, and packaging), extends core LLM with instruction fields and a new set_conversation API plus Agent wiring, updates workspace and tests to use the public setter, and adds two Baseten env vars to Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant Agent
participant LLM
participant VideoTrack
participant BasetenAPI
participant EventBus
User->>Agent: join()
Agent->>LLM: set_conversation(conversation)
User->>LLM: simple_response(text)
LLM->>VideoTrack: collect buffered frames
LLM->>BasetenAPI: stream request (text + frames + conv)
loop streaming
BasetenAPI-->>LLM: delta chunk
LLM->>EventBus: emit LLMResponseChunkEvent
end
BasetenAPI-->>LLM: stream complete
LLM->>EventBus: emit LLMResponseCompletedEvent
alt error
BasetenAPI-->>LLM: error
LLM->>EventBus: emit LLMErrorEvent
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
.env.example(1 hunks)agents-core/vision_agents/core/llm/llm.py(2 hunks)plugins/baseten/README.md(1 hunks)plugins/baseten/pyproject.toml(1 hunks)plugins/baseten/vision_agents/plugins/baseten/__init__.py(1 hunks)plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py(1 hunks)plugins/baseten/vision_agents/plugins/baseten/events.py(1 hunks)pyproject.toml(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/baseten/vision_agents/plugins/baseten/__init__.pyplugins/baseten/vision_agents/plugins/baseten/events.pyagents-core/vision_agents/core/llm/llm.pyplugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py
🧬 Code graph analysis (4)
plugins/baseten/vision_agents/plugins/baseten/__init__.py (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
BasetenVLM(32-274)
plugins/baseten/vision_agents/plugins/baseten/events.py (1)
agents-core/vision_agents/core/events/base.py (1)
PluginBaseEvent(52-54)
agents-core/vision_agents/core/llm/llm.py (1)
agents-core/vision_agents/core/agents/conversation.py (1)
Conversation(67-227)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/llm.py (3)
LLMResponseEvent(38-42)VideoLLM(443-464)_conversation(83-86)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)agents-core/vision_agents/core/processors/base_processor.py (1)
Processor(35-43)plugins/baseten/vision_agents/plugins/baseten/events.py (1)
LLMErrorEvent(7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
| async def simple_response( | ||
| self, | ||
| text: str, | ||
| processors: Optional[list[Processor]] = None, | ||
| participant: Optional[Participant] = None, | ||
| ) -> LLMResponseEvent: | ||
| """ | ||
| simple_response is a standardized way to create an LLM response. | ||
| This method is also called every time the new STT transcript is received. | ||
| Args: | ||
| text: The text to respond to. | ||
| processors: list of processors (which contain state) about the video/voice AI. | ||
| participant: the Participant object, optional. | ||
| Examples: | ||
| llm.simple_response("say hi to the user, be nice") | ||
| """ | ||
|
|
||
| # TODO: Clean up the `_build_enhanced_instructions` and use that. The should be compiled at the agent probably. | ||
|
|
||
| if self._conversation is None: | ||
| # The agent hasn't joined the call yet. | ||
| logger.warning( | ||
| "Cannot create an LLM response - the conversation has not been initialized yet." | ||
| ) | ||
| return LLMResponseEvent(original=None, text="") | ||
|
|
||
| messages = [] | ||
|
|
||
| # Add Agent's instructions as system prompt. | ||
| if self.instructions: | ||
| messages.append( | ||
| { | ||
| "role": "system", | ||
| "content": self.instructions, | ||
| } | ||
| ) | ||
|
|
||
| # TODO: Do we need to limit how many messages we send? | ||
| # Add all messages from the conversation to the prompt | ||
| for message in self._conversation.messages: | ||
| messages.append( | ||
| { | ||
| "role": message.role, | ||
| "content": message.content, | ||
| } | ||
| ) | ||
|
|
||
| # Attach the latest bufferred frames to the request | ||
| frames_data = [] | ||
| for frame_bytes in self._get_frames_bytes(): | ||
| frame_b64 = base64.b64encode(frame_bytes).decode("utf-8") | ||
| frame_msg = { | ||
| "type": "image_url", | ||
| "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"}, | ||
| } | ||
| frames_data.append(frame_msg) | ||
|
|
||
| logger.debug( | ||
| f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"' | ||
| ) | ||
|
|
||
| messages.append( | ||
| { | ||
| "role": "user", | ||
| "content": frames_data, | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the user text when building the chat request
text is never added to the messages payload, so the Baseten model only receives stale conversation history + images. That’s a hard correctness bug: it cannot respond to the new prompt. Please include the transcript (and avoid sending an empty content list) before dispatching the request, e.g.:
- frames_data = []
+ frames_data: list[dict[str, object]] = []
for frame_bytes in self._get_frames_bytes():
frame_b64 = base64.b64encode(frame_bytes).decode("utf-8")
frame_msg = {
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"},
}
frames_data.append(frame_msg)
+ if text:
+ frames_data.insert(0, {"type": "text", "text": text})
+
+ if not frames_data:
+ logger.warning(
+ "Cannot create an LLM response - no prompt text or frames available."
+ )
+ return LLMResponseEvent(original=None, text="")
+
logger.debug(
f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"'
)
- messages.append(
- {
- "role": "user",
- "content": frames_data,
- }
- )
+ messages.append({"role": "user", "content": frames_data})There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
89-158: Critical: User text is never sent to the model.The
textparameter containing the new user prompt is never added to the messages payload. The model only receives conversation history and frames, but cannot respond to the new input. This is a correctness bug that breaks the core functionality.Apply this diff to include the user text:
- frames_data = [] + frames_data: list[dict[str, object]] = [] for frame_bytes in self._get_frames_bytes(): frame_b64 = base64.b64encode(frame_bytes).decode("utf-8") frame_msg = { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"}, } frames_data.append(frame_msg) + if text: + frames_data.insert(0, {"type": "text", "text": text}) + + if not frames_data: + logger.warning( + "Cannot create an LLM response - no prompt text or frames available." + ) + return LLMResponseEvent(original=None, text="") + logger.debug( f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"' ) messages.append( { "role": "user", "content": frames_data, } )
🧹 Nitpick comments (5)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
86-87: Consider making frame dimensions configurable.The frame dimensions are hardcoded to 800x600. Different models or use cases might benefit from different resolutions. Consider adding
frame_widthandframe_heightas constructor parameters.Apply this diff to add configurable dimensions:
def __init__( self, model: str, api_key: Optional[str] = None, base_url: Optional[str] = None, fps: int = 1, frame_buffer_seconds: int = 10, + frame_width: int = 800, + frame_height: int = 600, client: Optional[AsyncOpenAI] = None, ):Then update the initialization:
- self._frame_width = 800 - self._frame_height = 600 + self._frame_width = frame_width + self._frame_height = frame_height
92-93: Unused parameter: processors.The
processorsparameter is declared but never used in the method. Either utilize it or remove it from the signature.
110-110: Address or remove TODO comment.The TODO comment references
_build_enhanced_instructions, but this method is not present or used. Clarify the intended implementation or remove the comment.
129-129: Consider limiting conversation history size.The TODO comment raises a valid concern about message volume. Sending unbounded conversation history could lead to token limit errors or increased latency. Consider implementing a sliding window or token-based truncation strategy.
276-308: Well-implemented frame conversion utility.The function correctly handles aspect ratio preservation and uses appropriate resampling quality (LANCZOS). The TODO comment about moving to core utils is valid—this utility could benefit other plugins.
Would you like me to open an issue to track moving this utility to a shared location?
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py
🧬 Code graph analysis (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(450-471)agents-core/vision_agents/core/processors/base_processor.py (1)
Processor(35-43)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)plugins/baseten/vision_agents/plugins/baseten/events.py (1)
LLMErrorEvent(7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (2)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (2)
1-28: LGTM!Imports are well-organized and all appear necessary for the implementation. No sys.path modifications present, adhering to coding guidelines.
263-273: LGTM!The method correctly iterates over buffered frames and converts them to JPEG bytes. Implementation is clean and well-documented.
| if not shared_forwarder: | ||
| self._video_forwarder = shared_forwarder or VideoForwarder( | ||
| cast(VideoStreamTrack, track), | ||
| max_buffer=10, | ||
| fps=1.0, # Low FPS for VLM | ||
| name="baseten_vlm_forwarder", | ||
| ) | ||
| await self._video_forwarder.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix redundant condition and assignment logic.
The condition if not shared_forwarder: is followed by shared_forwarder or VideoForwarder(...), which is redundant. Additionally, if shared_forwarder is provided, calling await self._video_forwarder.start() may attempt to start an already-running forwarder.
Apply this diff to fix the logic:
logger.info("🎥 BasetenVLM subscribing to VideoForwarder")
- if not shared_forwarder:
- self._video_forwarder = shared_forwarder or VideoForwarder(
+ if shared_forwarder is None:
+ self._video_forwarder = VideoForwarder(
cast(VideoStreamTrack, track),
max_buffer=10,
fps=1.0, # Low FPS for VLM
name="baseten_vlm_forwarder",
)
await self._video_forwarder.start()
else:
self._video_forwarder = shared_forwarder🤖 Prompt for AI Agents
In plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py around lines 249
to 256, the check `if not shared_forwarder:` is redundant with using
`shared_forwarder or VideoForwarder(...)` and may call start on an
already-running forwarder; change the logic to assign self._video_forwarder to
shared_forwarder when provided, otherwise create a new VideoForwarder instance,
and call await self._video_forwarder.start() only when you created the new
forwarder (i.e., inside the branch that constructs it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| "role": "user", | ||
| "content": frames_data, | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: User's Voice Silenced in LLM Request
The text parameter containing the user's question is never included in the API request. Only video frames are sent in the final user message, causing the LLM to receive frames without the actual question or prompt. The user's text input needs to be added to the message content alongside the frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (3)
31-39: Complete the TODO in the class docstring.The docstring still contains a TODO placeholder. Please add a brief description of the class purpose (e.g., "A video language model backed by Baseten-hosted models that processes video frames alongside text prompts"), document key parameters, and provide usage guidance.
88-157: CRITICAL: User prompt is never sent to the model.The
textparameter (Line 90) is never added to the messages payload. Only video frames are included in the final user message (Lines 152-157). This is a correctness bug that breaks the core functionality—the model cannot respond to the user's actual question.Apply this diff to fix:
# Attach the latest bufferred frames to the request - frames_data = [] + frames_data: list[dict[str, object]] = [] for frame_bytes in self._get_frames_bytes(): frame_b64 = base64.b64encode(frame_bytes).decode("utf-8") frame_msg = { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"}, } frames_data.append(frame_msg) + if text: + frames_data.insert(0, {"type": "text", "text": text}) + + if not frames_data: + logger.warning( + "Cannot create an LLM response - no prompt text or frames available." + ) + return LLMResponseEvent(original=None, text="") + logger.debug( f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"' ) messages.append( { "role": "user", "content": frames_data, } )
247-257: Fix redundant condition and avoid starting an already-running forwarder.The condition
if not shared_forwarder:followed byshared_forwarder or VideoForwarder(...)contains dead code—theshared_forwarder orpart can never be reached. Additionally, callingawait self._video_forwarder.start()whenshared_forwarderis provided may attempt to start an already-running forwarder.Apply this diff:
logger.info("🎥 BasetenVLM subscribing to VideoForwarder") - if not shared_forwarder: - self._video_forwarder = shared_forwarder or VideoForwarder( + if shared_forwarder is None: + self._video_forwarder = VideoForwarder( cast(VideoStreamTrack, track), max_buffer=10, fps=1.0, # Low FPS for VLM name="baseten_vlm_forwarder", ) await self._video_forwarder.start() else: self._video_forwarder = shared_forwarder
🧹 Nitpick comments (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
70-73: Enhance credential error messages.The error messages for missing credentials could be more helpful by mentioning the environment variable names.
Apply this diff:
elif not api_key: - raise ValueError("api_key must be provided") + raise ValueError("api_key must be provided or set via BASETEN_API_KEY environment variable") elif not base_url: - raise ValueError("base_url must be provided") + raise ValueError("base_url must be provided or set via BASETEN_BASE_URL environment variable")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
plugins/baseten/example/README.md(1 hunks)plugins/baseten/example/pyproject.toml(1 hunks)plugins/baseten/example/qwen_vl_example.py(1 hunks)plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- plugins/baseten/example/README.md
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/baseten/example/qwen_vl_example.py (4)
agents-core/vision_agents/core/agents/agents.py (3)
Agent(125-1357)create_call(698-703)finish(558-591)agents-core/vision_agents/core/edge/types.py (1)
User(15-18)agents-core/vision_agents/core/agents/agent_launcher.py (1)
AgentLauncher(18-117)plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
simple_response(88-223)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(450-471)agents-core/vision_agents/core/processors/base_processor.py (1)
Processor(35-43)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)plugins/baseten/vision_agents/plugins/baseten/events.py (1)
LLMErrorEvent(7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (10)
plugins/baseten/example/pyproject.toml (2)
1-13: LGTM!Project metadata and dependencies align correctly with the example script's imports.
15-20: LGTM!Workspace source configuration correctly references all plugin dependencies.
plugins/baseten/example/qwen_vl_example.py (4)
1-11: LGTM!Imports and environment loading are correct.
14-28: LGTM!Agent construction correctly wires Baseten VLM with edge, STT, TTS, and appropriate instructions.
30-43: Verify that the critical bug in baseten_vlm.py was addressed.The join flow is correctly structured. However, Line 38 calls
agent.simple_response("Describe what you currently see"), which relies on BasetenVLM'ssimple_responsemethod. Past review comments identified a critical bug where thetextparameter is never added to the messages payload (lines 88-157 in baseten_vlm.py), meaning the prompt won't reach the model. Please ensure this bug was fixed before merging.
45-46: LGTM!CLI entry point correctly wires the AgentLauncher.
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (4)
1-28: LGTM!Imports and constants are correctly defined.
159-223: LGTM!Streaming response logic correctly handles API calls, error reporting via LLMErrorEvent, and emits appropriate chunk and completion events.
262-272: LGTM!The frame iterator correctly processes buffered frames.
275-307: LGTM!Frame-to-JPEG conversion correctly maintains aspect ratio and uses appropriate resampling. The TODO comment about moving to core utils is a valid future refactoring consideration.
Summary by CodeRabbit
New Features
Documentation
Chores
Tests
Note
Add Baseten VLM plugin with video frame buffering and streaming, introduce LLM.set_conversation wired from Agent.join, and update tests/env/workspace accordingly.
plugins/baseten):BasetenVLMthat buffers video frames, converts to JPEG, and streams responses via OpenAI-compatible client; emitsLLMResponseChunkEvent,LLMResponseCompletedEvent, and pluginLLMErrorEvent.example/qwen_vl_example.py), packaging (pyproject.toml,py.typed), and export (__init__.py).LLM.set_conversation(conversation)and call it fromAgent.joinafter conversation creation, enabling chat history access.instructions/parsed_instructionsinLLM; importInstructionstype.set_conversationinstead of direct_conversationassignment.BASETEN_API_KEYandBASETEN_BASE_URLto.env.example.pyproject.toml,uv.lock).Written by Cursor Bugbot for commit 2851eeb. This will update automatically on new commits. Configure here.