-
Notifications
You must be signed in to change notification settings - Fork 75
Feat: Add support for Moondream VLM functions #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds Moondream VLM support: new CloudVLM and LocalVLM implementations (streaming VQA/caption), STT integration and device selection, examples and packaging, updated exports/imports, agent warmup extension, and integration tests for cloud and local flows. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent
participant CloudVLM
participant STT as STT Service
participant Moondream as Moondream SDK
participant Events as Event Bus
Agent->>CloudVLM: watch_video_track(track)
activate CloudVLM
CloudVLM->>CloudVLM: setup forwarder & callbacks
CloudVLM->>CloudVLM: _setup_stt_subscription()
note right of CloudVLM `#E8F0FF`: subscribe to STT transcripts
rect rgb(245,245,255)
Agent->>CloudVLM: frame delivered (_on_frame_received)
CloudVLM->>CloudVLM: buffer frame, update _latest_frame
end
rect rgb(255,245,250)
STT->>CloudVLM: transcript event
CloudVLM->>CloudVLM: _process_frame(transcript)
end
rect rgb(240,255,240)
CloudVLM->>Moondream: query(image,text) / caption(image) [stream=true]
activate Moondream
Moondream-->>CloudVLM: streaming generator (chunks)
deactivate Moondream
CloudVLM->>CloudVLM: _consume_stream -> assemble text
loop stream chunks
CloudVLM->>Events: emit LLMResponseChunkEvent
end
CloudVLM->>Events: emit LLMResponseCompletedEvent
CloudVLM-->>Agent: return LLMResponseEvent
end
Agent->>CloudVLM: close()
deactivate CloudVLM
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Areas to focus:
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
plugins/moondream/example/moondream_vlm_example.py(1 hunks)plugins/moondream/example/pyproject.toml(1 hunks)plugins/moondream/tests/test_moondream_vlm.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/__init__.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/moondream/example/moondream_vlm_example.pyplugins/moondream/tests/test_moondream_vlm.pyplugins/moondream/vision_agents/plugins/moondream/__init__.pyplugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py
🧬 Code graph analysis (4)
plugins/moondream/example/moondream_vlm_example.py (3)
agents-core/vision_agents/core/edge/types.py (1)
User(15-18)agents-core/vision_agents/core/agents/agents.py (1)
Agent(126-1356)plugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py (1)
CloudVLM(27-252)
plugins/moondream/tests/test_moondream_vlm.py (1)
plugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py (3)
CloudVLM(27-252)close(247-252)simple_response(203-224)
plugins/moondream/vision_agents/plugins/moondream/__init__.py (1)
plugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py (1)
CloudVLM(27-252)
plugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py (5)
agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(16-47)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(437-458)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)agents-core/vision_agents/core/utils/queue.py (2)
LatestNQueue(6-28)put_latest_nowait(22-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
agents-core/vision_agents/core/agents/agents.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/__init__.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- agents-core/vision_agents/core/agents/agents.py
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/moondream/vision_agents/plugins/moondream/init.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
🧬 Code graph analysis (1)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (6)
agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(16-47)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(437-458)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)agents-core/vision_agents/core/utils/queue.py (2)
LatestNQueue(6-28)put_latest_nowait(22-28)agents-core/vision_agents/core/agents/agents.py (1)
subscribe(308-320)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
🧬 Code graph analysis (1)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (6)
agents-core/vision_agents/core/agents/agents.py (2)
AgentOptions(93-103)default_agent_options(110-111)agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(16-47)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(437-458)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)agents-core/vision_agents/core/utils/queue.py (2)
LatestNQueue(6-28)put_latest_nowait(22-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
plugins/moondream/example/moondream_vlm_example.py (1)
46-50: Fix the async context manager usage.Line 46 uses
with await agent.join(call):which will fail at runtime. Thejoin()method returns an async context manager, so you must useasync withinstead ofwith await.Apply this diff:
- with await agent.join(call): + async with agent.join(call): # Open the demo UI await agent.edge.open_demo(call) # run till the call ends await agent.finish()plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)
110-112: Guard the STT subscriber against non-transcript events.The
@self.agent.events.subscribedecorator subscribes to all events on the agent's event bus, not justSTTTranscriptEvent. When non-STT events arrive (e.g., LLM or TTS events), accessingevent.textwill raiseAttributeError, crashing the VLM.Apply this diff:
@self.agent.events.subscribe async def on_stt_transcript(event: STTTranscriptEvent): + if not isinstance(event, STTTranscriptEvent): + return await self._on_stt_transcript(event)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)
185-220: Don't terminate a borrowed VideoForwarder.When
watch_video_trackreceives ashared_forwarder,_stop_watching_video_track(lines 360-365) unconditionally callsstop()on it, terminating the shared forwarder for all other consumers. Track whether LocalVLM owns the forwarder and only stop it when owned.Apply this diff:
def __init__(...): ... self._video_forwarder: Optional[VideoForwarder] = None + self._owns_video_forwarder = False ... async def watch_video_track(...): ... if shared_forwarder is not None: self._video_forwarder = shared_forwarder + self._owns_video_forwarder = False ... else: self._video_forwarder = VideoForwarder(...) + self._owns_video_forwarder = True ... async def _stop_watching_video_track(self) -> None: if self._video_forwarder is not None: - await self._video_forwarder.stop() + if self._owns_video_forwarder: + await self._video_forwarder.stop() + else: + logger.debug("Shared VideoForwarder left running; owner controls lifecycle") self._video_forwarder = None
🧹 Nitpick comments (1)
plugins/moondream/tests/test_moondream_local_vlm.py (1)
64-82: Consider providing a test helper for frame assignment.Lines 69 and 72 call warmup and directly assign to
_latest_frame(a private attribute). While this works, consider adding a public test helper method likeset_test_frame(frame)in LocalVLM to avoid reaching into private state.Apply this pattern if you choose to add a helper:
In
moondream_local_vlm.py:def set_test_frame(self, frame: av.VideoFrame) -> None: """Set frame for testing purposes. Not for production use.""" self._latest_frame = frameThen in the test:
await local_vlm_vqa.warmup() local_vlm_vqa.set_test_frame(golf_frame)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
agents-core/vision_agents/core/agents/agent_launcher.py(1 hunks)plugins/moondream/example/moondream_vlm_example.py(1 hunks)plugins/moondream/tests/test_moondream_local_vlm.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/__init__.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.pyplugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.pyplugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.pyagents-core/vision_agents/core/agents/agent_launcher.pyplugins/moondream/vision_agents/plugins/moondream/__init__.pyplugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.pyplugins/moondream/tests/test_moondream_local_vlm.pyplugins/moondream/example/moondream_vlm_example.py
🧬 Code graph analysis (8)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (1)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
MoondreamVideoTrack(16-76)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
MoondreamVideoTrack(16-76)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (6)
agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(16-47)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(437-458)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)agents-core/vision_agents/core/utils/queue.py (2)
LatestNQueue(6-28)put_latest_nowait(22-28)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (9)
watch_video_track(185-220)_on_frame_received(222-228)_setup_stt_subscription(230-237)on_stt_transcript(236-237)_on_stt_transcript(330-335)_consume_stream(239-252)_process_frame(254-328)simple_response(337-358)close(367-375)
agents-core/vision_agents/core/agents/agent_launcher.py (4)
agents-core/vision_agents/core/llm/llm.py (1)
warmup(63-71)agents-core/vision_agents/core/stt/stt.py (1)
warmup(39-47)agents-core/vision_agents/core/turn_detection/turn_detection.py (1)
warmup(36-44)agents-core/vision_agents/core/tts/tts.py (1)
warmup(73-81)
plugins/moondream/vision_agents/plugins/moondream/__init__.py (2)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)
CloudVLM(27-250)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)
LocalVLM(29-375)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (7)
agents-core/vision_agents/core/agents/agents.py (2)
AgentOptions(93-103)default_agent_options(110-111)agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(16-47)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(437-458)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)agents-core/vision_agents/core/utils/queue.py (2)
LatestNQueue(6-28)put_latest_nowait(22-28)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (6)
watch_video_track(60-95)_stop_watching_video_track(224-229)_on_frame_received(97-103)_setup_stt_subscription(105-112)_consume_stream(114-127)_process_frame(129-192)
plugins/moondream/tests/test_moondream_local_vlm.py (1)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (4)
LocalVLM(29-375)warmup(98-101)close(367-375)simple_response(337-358)
plugins/moondream/example/moondream_vlm_example.py (4)
agents-core/vision_agents/core/edge/types.py (1)
User(15-18)agents-core/vision_agents/core/agents/agents.py (3)
Agent(126-1356)create_call(697-702)finish(557-590)agents-core/vision_agents/core/agents/agent_launcher.py (1)
AgentLauncher(18-124)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)
CloudVLM(27-250)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (13)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
23-23: LGTM: Import path correctly updated.The import path now reflects the reorganized module structure with MoondreamVideoTrack under the detection subpackage.
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (1)
19-19: LGTM: Import path correctly updated.Consistent with the module reorganization, aligning with the detection subpackage structure.
plugins/moondream/vision_agents/plugins/moondream/__init__.py (2)
8-12: LGTM: Imports correctly updated to absolute paths.The imports now use absolute paths and include the new CloudVLM and LocalVLM classes, properly reflecting the module reorganization.
19-20: LGTM: Public API correctly expanded.CloudVLM and LocalVLM are now properly exposed in the package's public API.
plugins/moondream/tests/test_moondream_local_vlm.py (1)
34-46: LGTM: VQA fixture properly configured.The fixture correctly gates on HF_TOKEN availability and ensures proper warmup and cleanup.
plugins/moondream/example/moondream_vlm_example.py (1)
15-30: LGTM: Agent creation correctly configured.The agent is properly wired with CloudVLM, Stream edge, TTS, STT, and turn detection components.
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (3)
35-58: LGTM: Constructor properly initializes CloudVLM.The initialization correctly sets up frame buffers, executor, mode configuration, and loads the model with proper API key validation.
60-95: LGTM: Video track watching correctly implemented.The method properly handles both shared and dedicated VideoForwarder instances, sets up frame consumers at appropriate FPS (1.0 for VLM), and initializes STT subscriptions with proper guards.
129-192: LGTM: Frame processing correctly handles both VQA and caption modes.The implementation properly converts frames to PIL images, handles streaming responses via
_consume_stream, emits appropriate events, and manages the_is_processingflag to prevent concurrent processing. Error handling is comprehensive.plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (4)
50-96: LGTM: Constructor properly initializes LocalVLM with device auto-detection.The initialization correctly handles device selection, with appropriate MPS-to-CPU fallback and clear warning messages. Frame buffers, executor, and state tracking are properly set up.
115-183: LGTM: Model loading comprehensively handles gated model authentication.The synchronous model loading function provides excellent error handling with clear guidance for HF token authentication, appropriate device placement (CUDA/CPU), and graceful fallback when compilation fails. Error messages are actionable and user-friendly.
254-328: LGTM: Frame processing correctly handles both VQA and caption modes.The implementation properly validates frame and model availability, converts frames to PIL images, handles both dict-wrapped and direct generator results, consumes streams via
asyncio.to_thread, emits appropriate events, and manages the_is_processingflag. Error handling is comprehensive.
367-375: LGTM: Resource cleanup properly implemented.The close method correctly shuts down the executor, deletes the model reference to free memory, and logs the shutdown.
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
Outdated
Show resolved
Hide resolved
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
Outdated
Show resolved
Hide resolved
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
Show resolved
Hide resolved
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
Outdated
Show resolved
Hide resolved
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
Outdated
Show resolved
Hide resolved
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
Show resolved
Hide resolved
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
Show resolved
Hide resolved
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
Show resolved
Hide resolved
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
Outdated
Show resolved
Hide resolved
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)
239-239: Remove unusedself._shutdownattribute.Based on past review feedback,
self._shutdownis set but never used anywhere in the class. This is dead code and should be removed.Apply this diff:
def close(self): """Clean up resources.""" - self._shutdown = True if hasattr(self, "executor"):
🧹 Nitpick comments (4)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (4)
28-33: Enhance docstring to follow Google style guide.The class docstring should follow the Google style guide with proper sections for description, attributes, and examples.
Apply this diff to improve the docstring:
- """ - Using the CloudVLM, you can send frames to the hosted Moondream model to perform either captioning or Visual queries. - The instructions are taken from the STT service and sent to the model along with the frame. Once the model has an output, the results are then vocalised with the supplied TTS service. - - You can specify whether to use the caption endpoint or query (VQA). - """ + """Cloud-based Moondream VLM for captioning and visual question answering. + + This class interfaces with the hosted Moondream model to process video frames + for either captioning or visual question answering (VQA). Instructions from the + STT service are sent to the model along with frames, and outputs are emitted as + LLM events for downstream processing. + + Attributes: + api_key: Moondream API key (from parameter or MOONDREAM_API_KEY env var). + mode: Operation mode, either "vqa" or "caption". + model: Initialized Moondream cloud VL model instance. + + Examples: + >>> vlm = CloudVLM(api_key="your_key", mode="vqa") + >>> await vlm.watch_video_track(track) + >>> response = await vlm.simple_response("What do you see?") + """As per coding guidelines.
35-57: Consider validating mode in__init__for fail-fast behavior.The mode parameter is only validated during
_process_frameexecution (line 179), which delays error detection. Validating in__init__would provide immediate feedback if an invalid mode is provided.Add validation after line 45:
self.max_workers = max_workers self.mode = mode + +if self.mode not in ("vqa", "caption"): + raise ValueError(f"Invalid mode: {self.mode}. Must be 'vqa' or 'caption'.")
133-135: Simplify by removing racy lock check.The
locked()check at line 133 introduces a time-of-check-to-time-of-use race: another coroutine may acquire the lock between the check and theasync withstatement at line 139. Since the lock is properly acquired afterward, this check only adds complexity without benefit—concurrent calls will simply queue at the lock acquisition instead.Apply this diff to simplify:
- if self._processing_lock.locked(): - logger.debug("Moondream processing already in progress, skipping") - return None - latest_frame = self._latest_frame async with self._processing_lock:Alternatively, if you want to skip rather than queue, use
acquire()withblocking=False:- if self._processing_lock.locked(): - logger.debug("Moondream processing already in progress, skipping") + if not self._processing_lock.acquire(blocking=False): + logger.debug("Moondream processing already in progress, skipping") return None latest_frame = self._latest_frame - async with self._processing_lock: + try: try: # Convert frame to PIL Image ... except Exception as e: logger.exception(f"Error processing frame: {e}") return LLMResponseEvent(original=None, text="", exception=e) + finally: + self._processing_lock.release()
27-242: Consider extracting common logic into a shared base class.CloudVLM and LocalVLM share substantial implementation patterns: frame buffering, VideoForwarder lifecycle, STT subscription, lock-based processing, and public API structure. This duplication increases maintenance burden and risk of inconsistency.
Consider extracting shared logic into a base class:
class MoondreamVLMBase(llm.VideoLLM): """Shared base for Moondream VLM implementations.""" def __init__(self, mode: Literal["vqa", "caption"], max_workers: int): # Common initialization: buffer, lock, executor, etc. ... async def watch_video_track(self, track, shared_forwarder): # Common video forwarding logic ... # Other shared methods... @abc.abstractmethod def _load_model(self): """Subclasses implement model loading (cloud vs. local).""" ...Then CloudVLM and LocalVLM would only implement model-specific logic.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
🧬 Code graph analysis (1)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (7)
agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(16-47)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(437-458)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)agents-core/vision_agents/core/utils/queue.py (2)
LatestNQueue(6-28)put_latest_nowait(22-28)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (2)
_load_model(149-161)close(237-242)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (9)
watch_video_track(185-220)_stop_watching_video_track(360-365)_on_frame_received(222-228)_setup_stt_subscription(230-237)on_stt_transcript(236-237)_on_stt_transcript(330-335)_process_frame(254-328)simple_response(337-358)close(367-375)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (8)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (8)
59-94: LGTM!The video track handling correctly supports both shared and self-managed VideoForwarder instances, with proper lifecycle management and one-time STT subscription setup.
96-102: LGTM!Frame buffering implementation is correct and matches the established pattern.
104-111: LGTM!STT subscription setup is correct and properly guarded.
113-126: LGTM!Stream consumption logic is defensive and correctly handles both expected string chunks and unexpected types with appropriate logging.
186-191: LGTM!STT transcript handling correctly validates text presence before processing.
193-214: Unusedprocessorsparameter in method signature.The
processorsparameter is accepted but never used in the implementation. This appears intentional for API compatibility with the base LLM interface, but consider documenting this in the docstring.If the parameter is unused by design (for interface compatibility), add a note to the docstring:
""" simple_response is a standardized way to create a response. Args: text: The text/question to respond to - processors: list of processors (which contain state) about the video/voice AI + processors: list of processors (which contain state) about the video/voice AI. + Note: Currently unused by CloudVLM. participant: optionally the participant objectAlternatively, if processors should be utilized, verify the intended behavior.
216-221: LGTM!Video track cleanup is correctly implemented.
223-235: LGTM!Model initialization correctly validates the API key and handles errors appropriately.
| def close(self): | ||
| """Clean up resources.""" | ||
| self._shutdown = True | ||
| if hasattr(self, "executor"): | ||
| self.executor.shutdown(wait=False) | ||
| logger.info("🛑 Moondream Processor closed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comprehensive resource cleanup.
The close() method is missing critical cleanup steps compared to the LocalVLM implementation. It should stop the video forwarder and clean up the model reference to prevent resource leaks.
Apply this diff:
def close(self):
"""Clean up resources."""
- self._shutdown = True
+ if self._video_forwarder is not None:
+ # Note: caller should await _stop_watching_video_track() before close()
+ # or make close() async
+ logger.warning("Video forwarder not stopped before close()")
+ if hasattr(self, "executor"):
self.executor.shutdown(wait=False)
- logger.info("🛑 Moondream Processor closed")
+ if hasattr(self, "model") and self.model is not None:
+ del self.model
+ self.model = None
+ logger.info("🛑 Moondream Cloud VLM closed")Alternatively, consider making close() async and properly awaiting _stop_watching_video_track().
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
around lines 237 to 242, the close() method only sets _shutdown and shuts the
executor but misses stopping the video forwarder and clearing model references;
update close() to (a) stop the video forwarder by calling and awaiting
_stop_watching_video_track() (make close async if you need to await it) or
explicitly call the synchronous stop routine for the forwarder, (b) null out or
delete the model reference (e.g., self._model = None) to release resources, (c)
ensure executor.shutdown is only called if executor exists (as currently done),
and (d) add error handling/logging around each cleanup step so failures don't
prevent other resources from being freed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)
163-198: Track VideoForwarder ownership to avoid stopping shared instances.When
watch_video_trackreceives ashared_forwarder, the code stores it inself._video_forwarderbut doesn't distinguish ownership. Later,_stop_watching_video_track(lines 332-337) unconditionally callsstop()on any forwarder, which would terminate the shared instance for all other consumers.Apply this diff to track ownership:
self._frame_buffer: LatestNQueue[av.VideoFrame] = LatestNQueue(maxlen=10) self._latest_frame: Optional[av.VideoFrame] = None self._video_forwarder: Optional[VideoForwarder] = None + self._owns_video_forwarder = False self._stt_subscription_setup = Falseif shared_forwarder is not None: self._video_forwarder = shared_forwarder + self._owns_video_forwarder = False logger.info("🎥 Moondream Local VLM subscribing to shared VideoForwarder")else: self._video_forwarder = VideoForwarder( track, # type: ignore[arg-type] max_buffer=10, fps=1.0, name="moondream_local_vlm_forwarder", ) + self._owns_video_forwarder = True await self._video_forwarder.start()async def _stop_watching_video_track(self) -> None: """Stop video forwarding.""" if self._video_forwarder is not None: - await self._video_forwarder.stop() + if self._owns_video_forwarder: + await self._video_forwarder.stop() + else: + logger.debug("Shared VideoForwarder left running; owner controls lifecycle") self._video_forwarder = None logger.info("Stopped video forwarding")
🧹 Nitpick comments (11)
plugins/moondream/tests/test_moondream_local_vlm.py (5)
69-69: Remove redundant warmup call.The
local_vlm_vqafixture already callswarmup()at line 43, so this second call is unnecessary.Apply this diff:
- await local_vlm_vqa.warmup() assert local_vlm_vqa.model is not None, "Model must be loaded before test"
72-72: Consider using the public API instead of private members.Directly manipulating
_latest_framebypasses the intended API. For a more realistic test, consider setting up a proper video track flow or adding a public method to inject frames for testing.
90-90: Remove redundant warmup call.The
local_vlm_captionfixture already callswarmup()at line 58, making this call unnecessary.Apply this diff:
- await local_vlm_caption.warmup() assert local_vlm_caption.model is not None, "Model must be loaded before test"
93-93: Consider using the public API instead of private members.Directly setting
_latest_framecircumvents the intended interface. For more robust testing, consider using the public video track API or adding a test-specific frame injection method.
99-102: Simplify redundant assertion.Line 99 already checks
len(response.text) > 0, so the additional check at line 102 forlen(response.text.strip()) > 0is redundant. If you want to verify non-whitespace content, replace line 99 instead.Apply this diff:
assert response.text is not None - assert len(response.text) > 0 assert response.exception is None - assert len(response.text.strip()) > 0 + assert len(response.text.strip()) > 0, "Response should contain non-whitespace text"plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
149-156: Potentially redundant.to(self.device)call.Line 156 calls
.to(self.device)afterfrom_pretrainedalready specifiesdevice_map={"": self.device}. Thedevice_mapparameter should handle device placement, making the explicit.to()call redundant.If device_map is working correctly, apply this diff:
model = AutoModelForCausalLM.from_pretrained( self.model_name, device_map={"": self.device}, dtype=self._dtype, trust_remote_code=True, cache_dir=self.options.model_dir, **load_kwargs, - ).to(self.device) + )plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (5)
104-106: Consider usingself.executorfor consistency.While
asyncio.to_threadworks, usingself.executorprovides better control over threading behavior and is more consistent with the processor pattern used elsewhere.Based on learnings
Apply this diff:
- self.model = await asyncio.to_thread( # type: ignore[func-returns-value] - lambda: self._load_model_sync() - ) + loop = asyncio.get_event_loop() + self.model = await loop.run_in_executor( + self.executor, self._load_model_sync + )
259-259: Consider usingself.executorfor consistency.Using
asyncio.to_threadhere works, butself.executorwould provide better control and consistency with the established pattern.Based on learnings
Apply this diff:
- result = await asyncio.to_thread(self.model.query, image, text, stream=True) + loop = asyncio.get_event_loop() + result = await loop.run_in_executor( + self.executor, self.model.query, image, text, True + )Note: The
stream=Truekeyword argument needs to be passed positionally or wrapped in a lambda if the API doesn't support it directly viarun_in_executor.
266-266: Consider usingself.executorfor consistency.Same issue as line 259—using
self.executorprovides better threading control.Based on learnings
Apply this diff:
- answer = await asyncio.to_thread(self._consume_stream, stream) + loop = asyncio.get_event_loop() + answer = await loop.run_in_executor( + self.executor, self._consume_stream, stream + )
278-278: Consider usingself.executorfor consistency.Same pattern as lines 259 and 266—prefer
self.executorfor better control.Based on learnings
Apply this diff:
- result = await asyncio.to_thread(self.model.caption, image, length="normal", stream=True) + loop = asyncio.get_event_loop() + result = await loop.run_in_executor( + self.executor, self.model.caption, image, "normal", True + )Note: Positional arguments for
lengthandstreamsince keyword arguments may not work withrun_in_executor.
285-285: Consider usingself.executorfor consistency.Final instance of
asyncio.to_thread—same recommendation as previous occurrences.Based on learnings
Apply this diff:
- caption = await asyncio.to_thread(self._consume_stream, stream) + loop = asyncio.get_event_loop() + caption = await loop.run_in_executor( + self.executor, self._consume_stream, stream + )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
plugins/moondream/tests/test_moondream_local_vlm.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py(7 hunks)plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.pyplugins/moondream/tests/test_moondream_local_vlm.pyplugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.pyplugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py
🧬 Code graph analysis (4)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)
plugins/moondream/tests/test_moondream_local.py (3)
is_available(188-189)is_available(216-217)is_available(244-245)
plugins/moondream/tests/test_moondream_local_vlm.py (1)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (4)
LocalVLM(32-345)warmup(94-97)close(339-345)simple_response(309-330)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (8)
agents-core/vision_agents/core/agents/agents.py (2)
AgentOptions(93-103)default_agent_options(110-111)agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(16-47)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(437-458)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)agents-core/vision_agents/core/utils/queue.py (2)
LatestNQueue(6-28)put_latest_nowait(22-28)plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)
handle_device(7-11)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
_load_model_sync(129-183)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (3)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (3)
parse_detection_bbox(13-31)annotate_detections(48-111)handle_device(7-11)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
MoondreamVideoTrack(16-76)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (4)
process_video(101-143)_process_and_add_frame(205-234)_run_inference(163-175)_run_detection_sync(177-203)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (19)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)
7-11: LGTM!The device selection logic is clean and appropriate. CUDA with float16 for GPU acceleration, CPU with float32 as the fallback—straightforward and correct.
plugins/moondream/tests/test_moondream_local_vlm.py (3)
20-25: LGTM!The fixture correctly loads and converts the test image to RGB format.
28-31: LGTM!Clean conversion from PIL Image to av.VideoFrame.
34-46: LGTM!The fixture properly initializes the VLM, warms it up, and ensures cleanup with the finally block.
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (6)
22-23: LGTM!Import updates correctly reference the new
handle_deviceutility and the video track module.
55-81: LGTM!The
force_cpuparameter is properly integrated and the device selection logic correctly delegates tohandle_device()when not forcing CPU.
162-166: LGTM!The guarded compilation with fallback is a solid defensive pattern for platform compatibility.
185-228: LGTM!The video processing setup with lazy model loading and VideoForwarder sharing is well-structured and consistent with the cloud processor pattern.
234-278: LGTM!The inference flow properly offloads synchronous model operations to the executor, maintaining responsiveness in the async event loop.
280-301: LGTM!Frame processing with conditional annotation and robust error handling looks solid.
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (9)
72-74: MPS override is silent but acceptable.The code detects MPS and forces CPU mode with a warning. While this silently overrides the user's device choice, it's justified by the model's incompatibility. The warning message is clear.
76-79: LGTM!Device selection logic is consistent with the detection processor and properly uses the
handle_device()utility.
109-161: LGTM!Model loading with proper authentication handling, device mapping, and guarded compilation. The implementation is solid.
200-206: LGTM!Frame buffering with proper error handling is straightforward and correct.
208-216: LGTM!STT subscription setup follows the standard event handling pattern.
217-230: LGTM!Stream consumption with type checking and defensive handling of unexpected chunk types is well-implemented.
302-307: LGTM!STT transcript handler correctly delegates to frame processing.
309-330: LGTM!The
simple_responseAPI correctly handles both VQA and caption modes with appropriate error handling.
339-345: LGTM!Resource cleanup is properly implemented with executor shutdown and model deletion.
|
|
||
| def __init__( | ||
| self, | ||
| mode: Literal["vqa", "caption"] = "vqa", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate mode parameter at initialization.
The mode parameter accepts Literal["vqa", "caption"] but there's no runtime validation. Invalid values could cause issues downstream in _process_frame.
Based on learnings
Apply this diff to add validation:
def __init__(
self,
mode: Literal["vqa", "caption"] = "vqa",
max_workers: int = 10,
force_cpu: bool = False,
model_name: str = "moondream/moondream3-preview",
options: Optional[AgentOptions] = None,
):
super().__init__()
+
+ if mode not in ("vqa", "caption"):
+ raise ValueError(f"mode must be 'vqa' or 'caption', got: {mode}")
self.max_workers = max_workers
self.mode = modeCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
around line 55, the mode parameter is typed as Literal["vqa","caption"] but not
validated at runtime; add a validation in the class initializer (after assigning
self.mode) that checks if self.mode is one of {"vqa", "caption"} and if not
raise a ValueError with a clear message (e.g., "Invalid mode: {self.mode}.
Expected 'vqa' or 'caption'") so invalid values are caught early before
_process_frame is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
40-53: Update the docstring to reflect the new API.Lines 48-49 document a
deviceparameter that no longer exists in the constructor signature. The API now usesforce_cpuinstead. This inconsistency will confuse users.Apply this diff:
detect_objects: Object(s) to detect. Moondream uses zero-shot detection, so any object string works. Examples: "person", "car", "basketball", ["person", "car", "dog"]. Default: "person" fps: Frame processing rate interval: Processing interval in seconds max_workers: Number of worker threads - device: Device to run inference on ('cuda', 'mps', or 'cpu'). - Auto-detects CUDA, then MPS (Apple Silicon), then defaults to CPU. + force_cpu: If True, forces CPU device regardless of hardware availability. + Otherwise auto-detects the best available device (CUDA > CPU). + Note: MPS is automatically converted to CPU as Moondream doesn't support MPS. model_name: Hugging Face model identifier (default: "moondream/moondream3-preview") options: AgentOptions for model directory configuration. If not provided, uses default_agent_options() which defaults to tempfile.gettempdir()
♻️ Duplicate comments (3)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (3)
52-64: Add runtime validation for the mode parameter.The
modeparameter lacks runtime validation. While type hints document the expected values, they don't prevent invalid strings at runtime, which would cause issues downstream in_process_frame.Apply this diff to add validation:
def __init__( self, mode: Literal["vqa", "caption"] = "vqa", max_workers: int = 10, force_cpu: bool = False, model_name: str = "moondream/moondream3-preview", options: Optional[AgentOptions] = None, ): super().__init__() + + if mode not in ("vqa", "caption"): + raise ValueError(f"mode must be 'vqa' or 'caption', got: {mode}") self.max_workers = max_workers self.mode = mode
166-202: Track VideoForwarder ownership to avoid terminating shared resources.When
watch_video_trackreceives ashared_forwarder, the current implementation doesn't track ownership. Later,_stop_watching_video_track(line 338) unconditionally callsstop()on the forwarder, which terminates the shared resource for all other consumers.Apply this diff to track ownership:
def __init__( ... ): ... self._video_forwarder: Optional[VideoForwarder] = None + self._owns_video_forwarder = False self._stt_subscription_setup = False ... async def watch_video_track( self, track: aiortc.mediastreams.MediaStreamTrack, shared_forwarder: Optional[VideoForwarder] = None ) -> None: ... if shared_forwarder is not None: self._video_forwarder = shared_forwarder + self._owns_video_forwarder = False logger.info("🎥 Moondream Local VLM subscribing to shared VideoForwarder") ... else: self._video_forwarder = VideoForwarder( track, # type: ignore[arg-type] max_buffer=10, fps=1.0, name="moondream_local_vlm_forwarder", ) + self._owns_video_forwarder = True ...
335-340: Only stop owned VideoForwarder instances.This method unconditionally stops the forwarder, but should only stop forwarders that LocalVLM created, not borrowed ones.
Apply this diff:
async def _stop_watching_video_track(self) -> None: """Stop video forwarding.""" if self._video_forwarder is not None: - await self._video_forwarder.stop() + if self._owns_video_forwarder: + await self._video_forwarder.stop() + else: + logger.debug("Shared VideoForwarder left running; owner controls lifecycle") self._video_forwarder = None logger.info("Stopped video forwarding")
🧹 Nitpick comments (3)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)
104-106: Consider consistency in thread pool usage.While
asyncio.to_threaddoes use a thread pool internally, it uses Python's default thread pool rather than theself.executorcreated at line 87. This means themax_workersparameter has no effect on these operations.If the intent is to control concurrency with
max_workers, use the configured executor:- self.model = await asyncio.to_thread( # type: ignore[func-returns-value] - lambda: self._load_model_sync() - ) + loop = asyncio.get_event_loop() + self.model = await loop.run_in_executor( + self.executor, + self._load_model_sync + )The same pattern applies to lines 262, 269, 281, and 288. If the current approach is intentional (using the default pool), consider removing the unused
self.executorto clarify the design.
342-348: Consider waiting for executor tasks during shutdown.Using
wait=Falseinexecutor.shutdown()may leave tasks running when the model is deleted, potentially causing errors during cleanup.Consider waiting for tasks to complete gracefully:
def close(self): """Clean up resources.""" - self.executor.shutdown(wait=False) + self.executor.shutdown(wait=True) if self.model is not None: del self.model self.model = None logger.info("🛑 Moondream Local VLM closed")Alternatively, if immediate shutdown is required, consider using a brief timeout to balance responsiveness with graceful completion.
plugins/moondream/tests/test_moondream_local.py (1)
263-269: Remove redundant test code.The test at line 264 uses
force_cpu=True, which doesn't verify MPS-to-CPU conversion—it bypasses the auto-detection logic entirely. The comment claims it tests "explicit MPS parameter," but the old API no longer exists. This segment duplicates what line 256 already verifies.Apply this diff:
- # Also test explicit MPS parameter - processor2 = LocalDetectionProcessor(force_cpu=True) - try: - # Verify explicit MPS is also converted to CPU - assert processor2.device == "cpu" - finally: - processor2.close()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
plugins/moondream/tests/test_moondream_local.py(4 hunks)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py(7 hunks)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/moondream/tests/test_moondream_local.pyplugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.pyplugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py
🔇 Additional comments (9)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)
235-303: Excellent async lock handling and error management.The non-blocking lock acquisition with timeout (lines 244-249) elegantly prevents concurrent processing without blocking callers. The frame processing logic properly handles both VQA and caption modes with good error handling and event emission.
109-164: Well-structured model loading with helpful error guidance.The authentication error handling is particularly helpful, providing clear instructions for users encountering gated model access. The fallback when compilation fails (lines 143-147) ensures robustness.
plugins/moondream/tests/test_moondream_local.py (3)
42-48: LGTM: Test fixture correctly usesforce_cpu=True.This ensures deterministic behavior in CI environments and avoids hardware dependencies during testing.
271-277: LGTM: Explicit CPU test is correct.Properly validates the
force_cpu=Trueflag behavior.
279-289: LGTM: CUDA auto-detection test is correct.Properly validates that the default constructor selects CUDA when available.
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (4)
78-81: Device resolution logic is clean and well-structured.The conditional handling of
force_cpuprovides a clear override path while delegating auto-detection to the centralizedhandle_device()utility.
161-167: LGTM: Non-fatal compilation fallback is good defensive programming.The try-except pattern ensures that compilation failures don't prevent model usage, with appropriate warning logs for debugging.
22-23: Verify thehandle_deviceutility exists and returns the expected types.The code imports
handle_devicefrommoondream_utilsand expects it to return a tuple of(device, dtype). Ensure this function is properly defined and its return types match the usage at line 81.
55-65: Breaking API change: Verify all callers are updated.The constructor signature changed from accepting a
devicestring parameter to aforce_cpuboolean flag. This is a breaking change for any existing code using the old API.
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (8)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (7)
104-106: Preferself.executoroverasyncio.to_threadfor consistency.Line 104 uses
asyncio.to_thread, but the class maintainsself.executor(Line 87). For consistency and explicit thread pool management, useself.executoras suggested in past reviews.Based on learnings
Apply this diff:
- self.model = await asyncio.to_thread( # type: ignore[func-returns-value] - lambda: self._load_model_sync() - ) + loop = asyncio.get_event_loop() + self.model = await loop.run_in_executor(self.executor, self._load_model_sync)
263-263: Preferself.executoroverasyncio.to_threadfor VQA query.Line 263 uses
asyncio.to_threadfor model inference. Useself.executorconsistently as suggested in past reviews.Based on learnings
Apply this diff:
- result = await asyncio.to_thread(self.model.query, image, text, stream=True) + loop = asyncio.get_event_loop() + result = await loop.run_in_executor(self.executor, self.model.query, image, text, stream=True)
270-270: Preferself.executorfor stream consumption.Line 270 uses
asyncio.to_thread. Useself.executorfor consistency.Based on learnings
Apply this diff:
- answer = await asyncio.to_thread(self._consume_stream, stream) + loop = asyncio.get_event_loop() + answer = await loop.run_in_executor(self.executor, self._consume_stream, stream)
282-289: Preferself.executorfor caption inference and stream consumption.Lines 282 and 289 use
asyncio.to_thread. Useself.executorfor consistency.Based on learnings
Apply this diff:
- result = await asyncio.to_thread(self.model.caption, image, length="normal", stream=True) + loop = asyncio.get_event_loop() + result = await loop.run_in_executor(self.executor, self.model.caption, image, length="normal", stream=True) if isinstance(result, dict) and "caption" in result: stream = result["caption"] else: stream = result - caption = await asyncio.to_thread(self._consume_stream, stream) + caption = await loop.run_in_executor(self.executor, self._consume_stream, stream)
52-79: Add runtime validation for themodeparameter.The
modeparameter acceptsLiteral["vqa", "caption"]but lacks runtime validation. Invalid values passed at runtime (e.g., from configuration files) would cause issues in_process_frameat Line 258.Based on learnings
Apply this diff:
def __init__( self, mode: Literal["vqa", "caption"] = "vqa", max_workers: int = 10, force_cpu: bool = False, model_name: str = "moondream/moondream3-preview", options: Optional[AgentOptions] = None, ): super().__init__() + + if mode not in ("vqa", "caption"): + raise ValueError(f"mode must be 'vqa' or 'caption', got: {mode}") self.max_workers = max_workers self.mode = mode
166-197: Don't stop a borrowedVideoForwarder—track ownership.Line 340 in
_stop_watching_video_trackcallsstop()onself._video_forwarderregardless of whether it's shared. Stopping a shared forwarder terminates video for all consumers. Track ownership and only stop forwarders created by this instance.Based on learnings
Apply this diff:
self._video_forwarder: Optional[VideoForwarder] = None + self._owns_video_forwarder = False self._stt_subscription_setup = False @@ if shared_forwarder is not None: self._video_forwarder = shared_forwarder + self._owns_video_forwarder = False logger.info("🎥 Moondream Local VLM subscribing to shared VideoForwarder") @@ else: self._video_forwarder = VideoForwarder( track, # type: ignore[arg-type] max_buffer=10, fps=1.0, name="moondream_local_vlm_forwarder", ) + self._owns_video_forwarder = True await self._video_forwarder.start()And update
_stop_watching_video_track:async def _stop_watching_video_track(self) -> None: """Stop video forwarding.""" if self._video_forwarder is not None: - await self._video_forwarder.stop() + if self._owns_video_forwarder: + await self._video_forwarder.stop() + else: + logger.debug("Shared VideoForwarder left running; owner controls lifecycle") self._video_forwarder = None logger.info("Stopped video forwarding")
344-350: Guard executor shutdown to prevent exceptions.Line 346 calls
self.executor.shutdown(wait=False)without checking if the executor exists. Ifclose()is called before initialization completes or multiple times, this could raise anAttributeError.Based on learnings
Apply this diff:
def close(self): """Clean up resources.""" - self.executor.shutdown(wait=False) + if self.executor is not None: + self.executor.shutdown(wait=False) if self.model is not None: del self.model self.model = Noneplugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
154-161: Redundant device placement remains—Lines 156 and 161 conflict.Line 156 passes
device_map={"": self._device}(atorch.deviceobject), then Line 161 calls.to(self._device). This redundancy still exists:
device_mapexpects string identifiers ("cuda","cpu"), nottorch.deviceobjects- When
device_mapis set, the model is already placed—calling.to()is unnecessaryApply this diff:
model = AutoModelForCausalLM.from_pretrained( self.model_name, - device_map={"": self._device}, - dtype=self._dtype, trust_remote_code=True, cache_dir=self.options.model_dir, **load_kwargs, - ).to(self._device) # type: ignore[arg-type] + ) + + # Place model on device with specified dtype + model = model.to(device=self._device, dtype=self._dtype)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
agents-core/vision_agents/core/agents/agents.py(2 hunks)plugins/moondream/example/moondream_vlm_example.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py(7 hunks)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- plugins/moondream/example/moondream_vlm_example.py
- agents-core/vision_agents/core/agents/agents.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.pyplugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
🧬 Code graph analysis (2)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (3)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (3)
parse_detection_bbox(13-31)annotate_detections(48-111)handle_device(7-11)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
MoondreamVideoTrack(16-76)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (7)
agents-core/vision_agents/core/agents/agents.py (2)
AgentOptions(92-102)default_agent_options(109-110)agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(16-47)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(437-458)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)agents-core/vision_agents/core/utils/queue.py (2)
LatestNQueue(6-28)put_latest_nowait(22-28)plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)
handle_device(7-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (5)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (4)
22-23: LGTM! Import updates align with new device handling.The addition of
handle_deviceand the updatedMoondreamVideoTrackimport path are appropriate for the refactored device management.
55-81: Device handling logic is clean and well-structured.The
force_cpuparameter provides explicit CPU override whilehandle_device()manages automatic device selection. The fallback to CPU when forcing is clear.
113-116: Device property provides clean public access.Exposing device type as a string property is a good ergonomic choice for logging and debugging.
190-232: LGTM! VideoForwarder integration follows the shared forwarder pattern correctly.The logic properly handles both shared and owned forwarder cases, subscribing at the specified fps when shared and creating its own otherwise. The consumer callback setup is appropriate.
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)
72-79: LGTM! Device handling properly manages MPS incompatibility.The MPS detection with CPU fallback and clear warning is appropriate. The
force_cpuandhandle_device()flow mirrors the detection processor correctly.
| model = AutoModelForCausalLM.from_pretrained( | ||
| self.model_name, | ||
| device_map={"": self.device}, | ||
| dtype=self._dtype, | ||
| **load_kwargs, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant device placement—device_map with .to() on Line 139.
Line 133 sets device_map={"": self.device} (a torch.device object), and Line 139 calls .to("cpu") conditionally. The device_map expects string identifiers, not torch.device objects, and the conditional .to() adds complexity.
Apply this diff:
model = AutoModelForCausalLM.from_pretrained(
self.model_name,
- device_map={"": self.device},
- dtype=self._dtype,
**load_kwargs,
)
- if self.force_cpu:
- model.to("cpu") # type: ignore[arg-type]
+ # Place model on device with specified dtype
+ model = model.to(device=self.device, dtype=self._dtype)
model.eval()🤖 Prompt for AI Agents
In plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
around lines 131 to 136, remove the device_map argument passed to
AutoModelForCausalLM.from_pretrained (device_map={"": self.device}) because
device_map expects string identifiers and we already perform explicit placement
with .to(...) later; call from_pretrained with self.model_name,
dtype=self._dtype and **load_kwargs only, and rely on the subsequent .to(...)
call to move the model to the desired device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
154-161: Remove redundant device placement.Lines 156 and 161 specify device placement twice:
device_map={"": self._device}expects string identifiers but receives atorch.deviceobject, and the subsequent.to(self._device)is redundant whendevice_mapis set.Apply this diff:
model = AutoModelForCausalLM.from_pretrained( self.model_name, - device_map={"": self._device}, - dtype=self._dtype, trust_remote_code=True, cache_dir=self.options.model_dir, **load_kwargs, - ).to(self._device) + ) + + # Move model to target device with specified dtype + model = model.to(device=self._device, dtype=self._dtype)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (3)
54-65: Validatemodeparameter at initialization.The
modeparameter acceptsLiteral["vqa", "caption"]but lacks runtime validation. Invalid values could cause issues downstream in_process_frame.Apply this diff:
def __init__( self, mode: Literal["vqa", "caption"] = "vqa", max_workers: int = 10, force_cpu: bool = False, model_name: str = "moondream/moondream3-preview", options: Optional[AgentOptions] = None, ): super().__init__() + + if mode not in ("vqa", "caption"): + raise ValueError(f"mode must be 'vqa' or 'caption', got: {mode}") self.max_workers = max_workers self.mode = mode
133-142: Remove redundant device placement.Line 135 sets
device_map={"": self.device}and line 141 conditionally calls.to("cpu"). Whendevice_mapis specified, the model is already placed; the subsequent.to()is unnecessary.Apply this diff:
model = AutoModelForCausalLM.from_pretrained( self.model_name, - device_map={"": self.device}, - dtype=self._dtype, **load_kwargs, ) - if self.force_cpu: - model.to("cpu") # type: ignore[arg-type] + # Place model on device with specified dtype + model = model.to(device=self.device, dtype=self._dtype) model.eval()
168-199: Track VideoForwarder ownership to avoid terminating borrowed forwarders.When
watch_video_trackis called with ashared_forwarder, line 182 borrows it, but_stop_watching_video_track(lines 339-344) stops it unconditionally. This shuts down the shared forwarder for every other consumer.Apply this diff:
self._frame_buffer: LatestNQueue[av.VideoFrame] = LatestNQueue(maxlen=10) self._latest_frame: Optional[av.VideoFrame] = None self._video_forwarder: Optional[VideoForwarder] = None + self._owns_video_forwarder = False self._stt_subscription_setup = False @@ if shared_forwarder is not None: self._video_forwarder = shared_forwarder + self._owns_video_forwarder = False logger.info("🎥 Moondream Local VLM subscribing to shared VideoForwarder") @@ else: self._video_forwarder = VideoForwarder( track, # type: ignore[arg-type] max_buffer=10, fps=1.0, name="moondream_local_vlm_forwarder", ) + self._owns_video_forwarder = True await self._video_forwarder.start() @@ async def _stop_watching_video_track(self) -> None: """Stop video forwarding.""" if self._video_forwarder is not None: - await self._video_forwarder.stop() + if self._owns_video_forwarder: + await self._video_forwarder.stop() + else: + logger.debug("Shared VideoForwarder left running; owner controls lifecycle") self._video_forwarder = None
🧹 Nitpick comments (4)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)
244-252: Consider simplifying lock acquisition.Lines 249-252 manually acquire the lock with a try-except, then release it in a finally block (lines 306-307). This pattern is more complex than necessary—the initial lock check at line 244 already guards against concurrent access.
Consider using the lock as a context manager throughout:
- try: - await self._processing_lock.acquire() - except Exception as e: - logger.warning(f"Failed to acquire lock: {e}") - return None - - latest_frame = self._latest_frame - - try: + async with self._processing_lock: + latest_frame = self._latest_frame + frame_array = latest_frame.to_ndarray(format="rgb24") ... - except Exception as e: - logger.exception(f"Error processing frame: {e}") - return LLMResponseEvent(original=None, text="", exception=e) - finally: - if self._processing_lock.locked(): - self._processing_lock.release() + except Exception as e: + logger.exception(f"Error processing frame: {e}") + return LLMResponseEvent(original=None, text="", exception=e)
260-300: Use the configured executor for thread pool management.Lines 265, 272, 284, and 291 use
asyncio.to_thread(), which creates ad-hoc threads. The class already initializesself.executor(aThreadPoolExecutorat line 89) for this purpose. Using the executor provides better resource pooling.Apply this pattern throughout:
- result = await asyncio.to_thread(self.model.query, image, text, stream=True) + loop = asyncio.get_event_loop() + result = await loop.run_in_executor(self.executor, self.model.query, image, text, True) if isinstance(result, dict) and "answer" in result: stream = result["answer"] else: stream = result - answer = await asyncio.to_thread(self._consume_stream, stream) + answer = await loop.run_in_executor(self.executor, self._consume_stream, stream)(Apply similar changes to the caption mode at lines 284 and 291.)
plugins/moondream/example/README.md (1)
1-2: Consider adding a comma for clarity.Static analysis suggests: "Moondream example**,** Please see root readme for details."
plugins/moondream/README.md (1)
168-168: Format bare URL as Markdown link.Static analysis flags the bare URL at line 168. Consider formatting it as a proper Markdown link for better readability:
-- Request access at https://huggingface.co/moondream/moondream3-preview +- Request access at [huggingface.co/moondream/moondream3-preview](https://huggingface.co/moondream/moondream3-preview)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
plugins/moondream/README.md(5 hunks)plugins/moondream/example/README.md(1 hunks)plugins/moondream/example/moondream_vlm_example.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py(2 hunks)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py(7 hunks)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py(1 hunks)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/moondream/example/moondream_vlm_example.pyplugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.pyplugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.pyplugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py
🧬 Code graph analysis (4)
plugins/moondream/example/moondream_vlm_example.py (3)
agents-core/vision_agents/core/agents/agents.py (7)
Agent(125-1355)create_user(682-694)create_call(696-701)subscribe(307-319)simple_response(292-305)join(471-554)finish(556-589)agents-core/vision_agents/core/agents/agent_launcher.py (1)
AgentLauncher(18-125)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)
CloudVLM(27-249)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (8)
agents-core/vision_agents/core/agents/agents.py (6)
AgentOptions(92-102)default_agent_options(109-110)subscribe(307-319)join(471-554)simple_response(292-305)close(591-663)agents-core/vision_agents/core/stt/events.py (1)
STTTranscriptEvent(16-47)agents-core/vision_agents/core/llm/events.py (2)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)agents-core/vision_agents/core/llm/llm.py (2)
LLMResponseEvent(38-42)VideoLLM(437-458)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)agents-core/vision_agents/core/utils/queue.py (2)
LatestNQueue(6-28)put_latest_nowait(22-28)plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)
handle_device(7-11)plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (9)
watch_video_track(66-101)_on_frame_received(103-109)_setup_stt_subscription(111-118)on_stt_transcript(117-118)_on_stt_transcript(193-198)_consume_stream(120-133)_process_frame(135-191)simple_response(200-221)close(244-249)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (3)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (3)
parse_detection_bbox(13-31)annotate_detections(48-111)handle_device(7-11)plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
MoondreamVideoTrack(16-79)agents-core/vision_agents/core/utils/video_forwarder.py (2)
VideoForwarder(14-195)start_event_consumer(109-195)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (1)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
MoondreamVideoTrack(16-79)
🪛 LanguageTool
plugins/moondream/README.md
[uncategorized] ~8-~8: Possible missing comma found.
Context: ...s Choose between cloud-hosted or local processing depending on your needs. When running l...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~165-~165: Possible missing article found.
Context: ... the model from HuggingFace and runs on device. It supports both VQA and captioning mo...
(AI_HYDRA_LEO_MISSING_THE)
[uncategorized] ~234-~234: Possible missing comma found.
Context: ...ry configuration. If not provided, uses default which defaults to tempfile.gettempdir()...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~240-~240: Loose punctuation mark.
Context: ...e. ### CloudVLM Parameters - api_key: str - API key for Moondream Cloud API. ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~241-~241: Loose punctuation mark.
Context: ..._API_KEYenvironment variable. -mode`: Literal["vqa", "caption"] - "vqa" for v...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~248-~248: Loose punctuation mark.
Context: ...mits. ### LocalVLM Parameters - mode: Literal["vqa", "caption"] - "vqa" for v...
(UNLIKELY_OPENING_PUNCTUATION)
plugins/moondream/example/README.md
[typographical] ~1-~1: Consider adding a comma here.
Context: ## Moondream example Please see root readme for details.
(PLEASE_COMMA)
🪛 markdownlint-cli2 (0.18.1)
plugins/moondream/README.md
168-168: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (13)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (2)
19-19: LGTM—Import path correctly updated.The import path for
MoondreamVideoTracknow reflects the module reorganization into thedetectionsubpackage.
31-47: LGTM—Documentation improvements enhance clarity.The updated docstring provides clearer guidance on rate limits and default values, making the API more discoverable.
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (3)
55-65: LGTM—Improved API withforce_cpuparameter.Replacing the
devicestring parameter with aforce_cpuboolean simplifies the API and enables automatic device detection, improving the user experience.
78-81: LGTM—Device selection logic is correct.The conditional device selection properly handles both forced CPU mode and automatic detection.
113-116: LGTM—Clean device property for external access.The property provides a string representation of the internal
torch.device, maintaining a clean public interface.plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)
74-76: LGTM—MPS incompatibility handled gracefully.The automatic CPU fallback for MPS devices is appropriate given the model's CUDA dependencies, and the warning message clearly explains the behavior.
346-352: LGTM—Resource cleanup is thorough.The
close()method properly shuts down the executor and cleans up the model reference.plugins/moondream/example/moondream_vlm_example.py (3)
15-29: LGTM—Agent creation is straightforward.The
create_agentfunction properly constructs an agent with CloudVLM, handling the API key via environment variable.
32-49: LGTM—Call joining and event handling are correct.The
join_callfunction properly sets up the call, subscribes to participant events, and triggers the agent response. The 2-second delay before prompting is a reasonable grace period.
52-53: LGTM—CLI integration follows standard pattern.The main entry point correctly wires the launcher with the agent creation and call-joining functions.
plugins/moondream/README.md (3)
12-14: LGTM—Installation command uses standard extras syntax.The updated installation command
vision-agents[moondream]follows Python packaging conventions for optional dependencies.
111-210: LGTM—VLM examples are comprehensive and clear.The Quick Start section provides complete, runnable examples for both CloudVLM and LocalVLM, with clear explanations of the VQA and caption modes.
238-254: LGTM—Parameter documentation is thorough.The CloudVLM and LocalVLM configuration sections clearly document all parameters with types, defaults, and usage guidance. The MPS-to-CPU conversion note is particularly helpful.
Summary by CodeRabbit
New Features
Tests
Documentation
Chores