Feat: Add support for Moondream VLM functions #154

Nash0x7E2 · 2025-11-05T23:37:10Z

Adds support for both Caption and VQA on Moondream
Add a basic example to the plugin dir

Summary by CodeRabbit

New Features
- Cloud and local vision–language models (VQA & captioning) with STT-triggered inputs, streaming responses, and a demo script to join calls.
- Agent warmup now includes processor warmups.
Tests
- Integration tests added for cloud and local VQA and captioning flows.
Documentation
- Expanded Moondream docs and example README with quick-starts and usage guidance.
Chores
- Project manifest added; improved device selection (auto-detect/force_cpu) and import/path cleanups.

coderabbitai · 2025-11-05T23:37:27Z

Walkthrough

Adds Moondream VLM support: new CloudVLM and LocalVLM implementations (streaming VQA/caption), STT integration and device selection, examples and packaging, updated exports/imports, agent warmup extension, and integration tests for cloud and local flows.

Changes

Cohort / File(s)	Summary
Cloud VLM `plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py`	New `CloudVLM` (llm.VideoLLM): API-key loading, frame buffering, watch_video_track/_on_frame_received, optional shared VideoForwarder, STT subscription, vqa/caption streaming, chunk & completion events, simple_response, locks, and cleanup.
Local VLM `plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py`	New `LocalVLM` (llm.VideoLLM): device handling via `handle_device`, async model load/warmup, frame buffering, watch_video_track, STT subscription, vqa/caption streaming, simple_response, and resource cleanup.
Examples & packaging `plugins/moondream/example/moondream_vlm_example.py`, `plugins/moondream/example/pyproject.toml`, `plugins/moondream/example/README.md`	New example script wiring CloudVLM into an edge agent (`create_agent` / `join_call`), example project manifest, and short README.
Tests (Cloud & Local) `plugins/moondream/tests/test_moondream_vlm.py`, `plugins/moondream/tests/test_moondream_local_vlm.py`, `plugins/moondream/tests/test_moondream_local.py`	New integration tests and fixtures for CloudVLM and LocalVLM (VQA and caption) using a sample image/frame; tests skip when env vars (`MOONDREAM_API_KEY` / `HF_TOKEN`) missing; updated tests to use `force_cpu`.
Package exports & imports `plugins/moondream/vision_agents/plugins/moondream/__init__.py`, `plugins/moondream/vision_agents/plugins/moondream/detection/...`	Adjusted imports to `vision_agents.plugins.moondream...`, added `CloudVLM` and `LocalVLM` to `__all__`, and updated detection module import paths.
Device utils `plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py`	Added `handle_device()` utility selecting device and dtype based on CUDA availability (imports `torch`).
Agent warmup `agents-core/vision_agents/core/agents/agent_launcher.py`	Warmup workflow extended to include warmup-capable items from an agent's `processors` collection.
Minor cleanup `agents-core/vision_agents/core/agents/agents.py`	Removed unused `Coroutine` typing import and discarded an unused result in `create_call`.
Docs `plugins/moondream/README.md`	Expanded documentation to include VLMs (CloudVLM, LocalVLM), updated installation and configuration guidance, and replaced device param with `force_cpu` guidance.

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant CloudVLM
    participant STT as STT Service
    participant Moondream as Moondream SDK
    participant Events as Event Bus

    Agent->>CloudVLM: watch_video_track(track)
    activate CloudVLM
    CloudVLM->>CloudVLM: setup forwarder & callbacks
    CloudVLM->>CloudVLM: _setup_stt_subscription()
    note right of CloudVLM `#E8F0FF`: subscribe to STT transcripts

    rect rgb(245,245,255)
    Agent->>CloudVLM: frame delivered (_on_frame_received)
    CloudVLM->>CloudVLM: buffer frame, update _latest_frame
    end

    rect rgb(255,245,250)
    STT->>CloudVLM: transcript event
    CloudVLM->>CloudVLM: _process_frame(transcript)
    end

    rect rgb(240,255,240)
    CloudVLM->>Moondream: query(image,text) / caption(image) [stream=true]
    activate Moondream
    Moondream-->>CloudVLM: streaming generator (chunks)
    deactivate Moondream
    CloudVLM->>CloudVLM: _consume_stream -> assemble text
    loop stream chunks
        CloudVLM->>Events: emit LLMResponseChunkEvent
    end
    CloudVLM->>Events: emit LLMResponseCompletedEvent
    CloudVLM-->>Agent: return LLMResponseEvent
    end

    Agent->>CloudVLM: close()
    deactivate CloudVLM

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Areas to focus:

_process_frame and streaming consumption ordering, chunk vs completion events
Concurrency and processing-lock guards to avoid races
VideoForwarder lifecycle, frame throttling, and cleanup
STT subscription setup/teardown to prevent duplicate handlers/leaks
Model loading/authentication and device/compilation fallback paths
LocalDetectionProcessor signature change (force_cpu) and device decision logic

Possibly related PRs

Add AudioLLM and VideoLLM base classes #151 — introduces VideoLLM/watch_video_track patterns used by the new VLMs.
Moondream Detection API #136 — overlapping Moondream plugin changes (exports, detection, VLMs, tests).
Cleanup stt #122 — STT transcript/event shape changes that CloudVLM/LocalVLM STT handlers depend on.

Suggested labels

examples, tests

Suggested reviewers

yarikdevcom
maxkahan

Poem

I set the camera like a bone against the dark,
the glass drinks light and spits a small, precise name.
The model threads its blackened mouth with answers,
each word a clean incision—cold, insistently alive.
We keep listening until the silence learns to speak.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 61.11% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Feat: Add support for Moondream VLM functions' directly summarizes the main change: introducing VLM (Vision Language Model) support for Moondream with CloudVLM and LocalVLM implementations for caption and VQA modes.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/moondream-vlm

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0de1cdd and 13fb325.

📒 Files selected for processing (1)

plugins/moondream/example/moondream_vlm_example.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

plugins/moondream/example/moondream_vlm_example.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Test "not integration"

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 58fd257 and a9a092e.

📒 Files selected for processing (5)

plugins/moondream/example/moondream_vlm_example.py (1 hunks)
plugins/moondream/example/pyproject.toml (1 hunks)
plugins/moondream/tests/test_moondream_vlm.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/__init__.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/moondream/example/moondream_vlm_example.py
plugins/moondream/tests/test_moondream_vlm.py
plugins/moondream/vision_agents/plugins/moondream/__init__.py
plugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py

🧬 Code graph analysis (4)

plugins/moondream/example/moondream_vlm_example.py (3)

agents-core/vision_agents/core/edge/types.py (1)

User (15-18)

agents-core/vision_agents/core/agents/agents.py (1)

Agent (126-1356)

plugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py (1)

CloudVLM (27-252)

plugins/moondream/tests/test_moondream_vlm.py (1)

plugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py (3)

CloudVLM (27-252)

close (247-252)

simple_response (203-224)

plugins/moondream/vision_agents/plugins/moondream/__init__.py (1)

plugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py (1)

CloudVLM (27-252)

plugins/moondream/vision_agents/plugins/moondream/moondream_vlm.py (5)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (16-47)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (437-458)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

agents-core/vision_agents/core/utils/queue.py (2)

LatestNQueue (6-28)

put_latest_nowait (22-28)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a9a092e and f838a1e.

📒 Files selected for processing (3)

agents-core/vision_agents/core/agents/agents.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/__init__.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

agents-core/vision_agents/core/agents/agents.py

🚧 Files skipped from review as they are similar to previous changes (1)

plugins/moondream/vision_agents/plugins/moondream/init.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py

🧬 Code graph analysis (1)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (6)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (16-47)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (437-458)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

agents-core/vision_agents/core/utils/queue.py (2)

LatestNQueue (6-28)

put_latest_nowait (22-28)

agents-core/vision_agents/core/agents/agents.py (1)

subscribe (308-320)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f838a1e and a0b5c9d.

📒 Files selected for processing (1)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py

🧬 Code graph analysis (1)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (6)

agents-core/vision_agents/core/agents/agents.py (2)

AgentOptions (93-103)

default_agent_options (110-111)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (16-47)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (437-458)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

agents-core/vision_agents/core/utils/queue.py (2)

LatestNQueue (6-28)

put_latest_nowait (22-28)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

plugins/moondream/example/moondream_vlm_example.py (1)
46-50: Fix the async context manager usage.

Line 46 uses with await agent.join(call): which will fail at runtime. The join() method returns an async context manager, so you must use async with instead of with await.

Apply this diff:
-    with await agent.join(call):
+    async with agent.join(call):
         # Open the demo UI
         await agent.edge.open_demo(call)
         # run till the call ends
         await agent.finish()
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)
110-112: Guard the STT subscriber against non-transcript events.

The @self.agent.events.subscribe decorator subscribes to all events on the agent's event bus, not just STTTranscriptEvent. When non-STT events arrive (e.g., LLM or TTS events), accessing event.text will raise AttributeError, crashing the VLM.

Apply this diff:
         @self.agent.events.subscribe
         async def on_stt_transcript(event: STTTranscriptEvent):
+            if not isinstance(event, STTTranscriptEvent):
+                return
             await self._on_stt_transcript(event)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)
185-220: Don't terminate a borrowed VideoForwarder.

When watch_video_track receives a shared_forwarder, _stop_watching_video_track (lines 360-365) unconditionally calls stop() on it, terminating the shared forwarder for all other consumers. Track whether LocalVLM owns the forwarder and only stop it when owned.

Apply this diff:
     def __init__(...):
         ...
         self._video_forwarder: Optional[VideoForwarder] = None
+        self._owns_video_forwarder = False
         ...
 
     async def watch_video_track(...):
         ...
         if shared_forwarder is not None:
             self._video_forwarder = shared_forwarder
+            self._owns_video_forwarder = False
             ...
         else:
             self._video_forwarder = VideoForwarder(...)
+            self._owns_video_forwarder = True
             ...
 
     async def _stop_watching_video_track(self) -> None:
         if self._video_forwarder is not None:
-            await self._video_forwarder.stop()
+            if self._owns_video_forwarder:
+                await self._video_forwarder.stop()
+            else:
+                logger.debug("Shared VideoForwarder left running; owner controls lifecycle")
             self._video_forwarder = None

🧹 Nitpick comments (1)

plugins/moondream/tests/test_moondream_local_vlm.py (1)
64-82: Consider providing a test helper for frame assignment.

Lines 69 and 72 call warmup and directly assign to _latest_frame (a private attribute). While this works, consider adding a public test helper method like set_test_frame(frame) in LocalVLM to avoid reaching into private state.

Apply this pattern if you choose to add a helper:

In moondream_local_vlm.py:
def set_test_frame(self, frame: av.VideoFrame) -> None:
    """Set frame for testing purposes. Not for production use."""
    self._latest_frame = frame
Then in the test:
await local_vlm_vqa.warmup()
local_vlm_vqa.set_test_frame(golf_frame)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a0b5c9d and a82e2e0.

📒 Files selected for processing (8)

agents-core/vision_agents/core/agents/agent_launcher.py (1 hunks)
plugins/moondream/example/moondream_vlm_example.py (1 hunks)
plugins/moondream/tests/test_moondream_local_vlm.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/__init__.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py
agents-core/vision_agents/core/agents/agent_launcher.py
plugins/moondream/vision_agents/plugins/moondream/__init__.py
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
plugins/moondream/tests/test_moondream_local_vlm.py
plugins/moondream/example/moondream_vlm_example.py

🧬 Code graph analysis (8)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (1)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)

MoondreamVideoTrack (16-76)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)

MoondreamVideoTrack (16-76)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (6)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (16-47)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (437-458)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

agents-core/vision_agents/core/utils/queue.py (2)

LatestNQueue (6-28)

put_latest_nowait (22-28)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (9)

watch_video_track (185-220)

_on_frame_received (222-228)

_setup_stt_subscription (230-237)

on_stt_transcript (236-237)

_on_stt_transcript (330-335)

_consume_stream (239-252)

_process_frame (254-328)

simple_response (337-358)

close (367-375)

agents-core/vision_agents/core/agents/agent_launcher.py (4)

agents-core/vision_agents/core/llm/llm.py (1)

warmup (63-71)

agents-core/vision_agents/core/stt/stt.py (1)

warmup (39-47)

agents-core/vision_agents/core/turn_detection/turn_detection.py (1)

warmup (36-44)

agents-core/vision_agents/core/tts/tts.py (1)

warmup (73-81)

plugins/moondream/vision_agents/plugins/moondream/__init__.py (2)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)

CloudVLM (27-250)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)

LocalVLM (29-375)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (7)

agents-core/vision_agents/core/agents/agents.py (2)

AgentOptions (93-103)

default_agent_options (110-111)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (16-47)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (437-458)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

agents-core/vision_agents/core/utils/queue.py (2)

LatestNQueue (6-28)

put_latest_nowait (22-28)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (6)

watch_video_track (60-95)

_stop_watching_video_track (224-229)

_on_frame_received (97-103)

_setup_stt_subscription (105-112)

_consume_stream (114-127)

_process_frame (129-192)

plugins/moondream/tests/test_moondream_local_vlm.py (1)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (4)

LocalVLM (29-375)

warmup (98-101)

close (367-375)

simple_response (337-358)

plugins/moondream/example/moondream_vlm_example.py (4)

agents-core/vision_agents/core/edge/types.py (1)

User (15-18)

agents-core/vision_agents/core/agents/agents.py (3)

Agent (126-1356)

create_call (697-702)

finish (557-590)

agents-core/vision_agents/core/agents/agent_launcher.py (1)

AgentLauncher (18-124)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)

CloudVLM (27-250)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"

🔇 Additional comments (13)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)

23-23: LGTM: Import path correctly updated.

The import path now reflects the reorganized module structure with MoondreamVideoTrack under the detection subpackage.

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (1)

19-19: LGTM: Import path correctly updated.

Consistent with the module reorganization, aligning with the detection subpackage structure.

plugins/moondream/vision_agents/plugins/moondream/__init__.py (2)

8-12: LGTM: Imports correctly updated to absolute paths.

The imports now use absolute paths and include the new CloudVLM and LocalVLM classes, properly reflecting the module reorganization.

19-20: LGTM: Public API correctly expanded.

CloudVLM and LocalVLM are now properly exposed in the package's public API.

plugins/moondream/tests/test_moondream_local_vlm.py (1)

34-46: LGTM: VQA fixture properly configured.

The fixture correctly gates on HF_TOKEN availability and ensures proper warmup and cleanup.

plugins/moondream/example/moondream_vlm_example.py (1)

15-30: LGTM: Agent creation correctly configured.

The agent is properly wired with CloudVLM, Stream edge, TTS, STT, and turn detection components.

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (3)

35-58: LGTM: Constructor properly initializes CloudVLM.

The initialization correctly sets up frame buffers, executor, mode configuration, and loads the model with proper API key validation.

60-95: LGTM: Video track watching correctly implemented.

The method properly handles both shared and dedicated VideoForwarder instances, sets up frame consumers at appropriate FPS (1.0 for VLM), and initializes STT subscriptions with proper guards.

129-192: LGTM: Frame processing correctly handles both VQA and caption modes.

The implementation properly converts frames to PIL images, handles streaming responses via _consume_stream, emits appropriate events, and manages the _is_processing flag to prevent concurrent processing. Error handling is comprehensive.

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (4)

50-96: LGTM: Constructor properly initializes LocalVLM with device auto-detection.

The initialization correctly handles device selection, with appropriate MPS-to-CPU fallback and clear warning messages. Frame buffers, executor, and state tracking are properly set up.

115-183: LGTM: Model loading comprehensively handles gated model authentication.

The synchronous model loading function provides excellent error handling with clear guidance for HF token authentication, appropriate device placement (CUDA/CPU), and graceful fallback when compilation fails. Error messages are actionable and user-friendly.

254-328: LGTM: Frame processing correctly handles both VQA and caption modes.

The implementation properly validates frame and model availability, converts frames to PIL images, handles both dict-wrapped and direct generator results, consumes streams via asyncio.to_thread, emits appropriate events, and manages the _is_processing flag. Error handling is comprehensive.

367-375: LGTM: Resource cleanup properly implemented.

The close method correctly shuts down the executor, deletes the model reference to free memory, and logs the shutdown.

agents-core/vision_agents/core/agents/agent_launcher.py

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)
239-239: Remove unused self._shutdown attribute.

Based on past review feedback, self._shutdown is set but never used anywhere in the class. This is dead code and should be removed.

Apply this diff:
     def close(self):
         """Clean up resources."""
-        self._shutdown = True
         if hasattr(self, "executor"):

🧹 Nitpick comments (4)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (4)

28-33: Enhance docstring to follow Google style guide.

The class docstring should follow the Google style guide with proper sections for description, attributes, and examples.

Apply this diff to improve the docstring:

-    """
-    Using the CloudVLM, you can send frames to the hosted Moondream model to perform either captioning or Visual queries.
-    The instructions are taken from the STT service and sent to the model along with the frame. Once the model has an output, the results are then vocalised with the supplied TTS service.
-
-    You can specify whether to use the caption endpoint or query (VQA).
-    """
+    """Cloud-based Moondream VLM for captioning and visual question answering.
+
+    This class interfaces with the hosted Moondream model to process video frames
+    for either captioning or visual question answering (VQA). Instructions from the
+    STT service are sent to the model along with frames, and outputs are emitted as
+    LLM events for downstream processing.
+
+    Attributes:
+        api_key: Moondream API key (from parameter or MOONDREAM_API_KEY env var).
+        mode: Operation mode, either "vqa" or "caption".
+        model: Initialized Moondream cloud VL model instance.
+
+    Examples:
+        >>> vlm = CloudVLM(api_key="your_key", mode="vqa")
+        >>> await vlm.watch_video_track(track)
+        >>> response = await vlm.simple_response("What do you see?")
+    """

As per coding guidelines.

35-57: Consider validating mode in __init__ for fail-fast behavior.

The mode parameter is only validated during _process_frame execution (line 179), which delays error detection. Validating in __init__ would provide immediate feedback if an invalid mode is provided.

Add validation after line 45:

 self.max_workers = max_workers
 self.mode = mode
+
+if self.mode not in ("vqa", "caption"):
+    raise ValueError(f"Invalid mode: {self.mode}. Must be 'vqa' or 'caption'.")

133-135: Simplify by removing racy lock check.

The locked() check at line 133 introduces a time-of-check-to-time-of-use race: another coroutine may acquire the lock between the check and the async with statement at line 139. Since the lock is properly acquired afterward, this check only adds complexity without benefit—concurrent calls will simply queue at the lock acquisition instead.

Apply this diff to simplify:

-        if self._processing_lock.locked():
-            logger.debug("Moondream processing already in progress, skipping")
-            return None
-
         latest_frame = self._latest_frame
 
         async with self._processing_lock:

Alternatively, if you want to skip rather than queue, use acquire() with blocking=False:

-        if self._processing_lock.locked():
-            logger.debug("Moondream processing already in progress, skipping")
+        if not self._processing_lock.acquire(blocking=False):
+            logger.debug("Moondream processing already in progress, skipping")
             return None
 
         latest_frame = self._latest_frame
 
-        async with self._processing_lock:
+        try:
             try:
                 # Convert frame to PIL Image
                 ...
             except Exception as e:
                 logger.exception(f"Error processing frame: {e}")
                 return LLMResponseEvent(original=None, text="", exception=e)
+        finally:
+            self._processing_lock.release()

27-242: Consider extracting common logic into a shared base class.

CloudVLM and LocalVLM share substantial implementation patterns: frame buffering, VideoForwarder lifecycle, STT subscription, lock-based processing, and public API structure. This duplication increases maintenance burden and risk of inconsistency.

Consider extracting shared logic into a base class:

class MoondreamVLMBase(llm.VideoLLM):
    """Shared base for Moondream VLM implementations."""
    
    def __init__(self, mode: Literal["vqa", "caption"], max_workers: int):
        # Common initialization: buffer, lock, executor, etc.
        ...
    
    async def watch_video_track(self, track, shared_forwarder):
        # Common video forwarding logic
        ...
    
    # Other shared methods...
    
    @abc.abstractmethod
    def _load_model(self):
        """Subclasses implement model loading (cloud vs. local)."""
        ...

Then CloudVLM and LocalVLM would only implement model-specific logic.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a82e2e0 and d1af35c.

📒 Files selected for processing (1)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py

🧬 Code graph analysis (1)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (7)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (16-47)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (437-458)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

agents-core/vision_agents/core/utils/queue.py (2)

LatestNQueue (6-28)

put_latest_nowait (22-28)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (2)

_load_model (149-161)

close (237-242)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (9)

watch_video_track (185-220)

_stop_watching_video_track (360-365)

_on_frame_received (222-228)

_setup_stt_subscription (230-237)

on_stt_transcript (236-237)

_on_stt_transcript (330-335)

_process_frame (254-328)

simple_response (337-358)

close (367-375)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"

🔇 Additional comments (8)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (8)
59-94: LGTM!

The video track handling correctly supports both shared and self-managed VideoForwarder instances, with proper lifecycle management and one-time STT subscription setup.

96-102: LGTM!

Frame buffering implementation is correct and matches the established pattern.

104-111: LGTM!

STT subscription setup is correct and properly guarded.

113-126: LGTM!

Stream consumption logic is defensive and correctly handles both expected string chunks and unexpected types with appropriate logging.

186-191: LGTM!

STT transcript handling correctly validates text presence before processing.

193-214: Unused processors parameter in method signature.

The processors parameter is accepted but never used in the implementation. This appears intentional for API compatibility with the base LLM interface, but consider documenting this in the docstring.

If the parameter is unused by design (for interface compatibility), add a note to the docstring:
     """
     simple_response is a standardized way to create a response.
 
     Args:
         text: The text/question to respond to
-        processors: list of processors (which contain state) about the video/voice AI
+        processors: list of processors (which contain state) about the video/voice AI.
+            Note: Currently unused by CloudVLM.
         participant: optionally the participant object
Alternatively, if processors should be utilized, verify the intended behavior.

216-221: LGTM!

Video track cleanup is correctly implemented.

223-235: LGTM!

Model initialization correctly validates the API key and handles errors appropriately.

coderabbitai · 2025-11-06T17:34:37Z

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py

+    def close(self):
+        """Clean up resources."""
+        self._shutdown = True
+        if hasattr(self, "executor"):
+            self.executor.shutdown(wait=False)
+        logger.info("🛑 Moondream Processor closed")


⚠️ Potential issue | 🟠 Major

Add comprehensive resource cleanup.

The close() method is missing critical cleanup steps compared to the LocalVLM implementation. It should stop the video forwarder and clean up the model reference to prevent resource leaks.

Apply this diff:

def close(self): """Clean up resources.""" - self._shutdown = True + if self._video_forwarder is not None: + # Note: caller should await _stop_watching_video_track() before close() + # or make close() async + logger.warning("Video forwarder not stopped before close()") + if hasattr(self, "executor"): self.executor.shutdown(wait=False) - logger.info("🛑 Moondream Processor closed") + if hasattr(self, "model") and self.model is not None: + del self.model + self.model = None + logger.info("🛑 Moondream Cloud VLM closed")

Alternatively, consider making close() async and properly awaiting _stop_watching_video_track().

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py around lines 237 to 242, the close() method only sets _shutdown and shuts the executor but misses stopping the video forwarder and clearing model references; update close() to (a) stop the video forwarder by calling and awaiting _stop_watching_video_track() (make close async if you need to await it) or explicitly call the synchronous stop routine for the forwarder, (b) null out or delete the model reference (e.g., self._model = None) to release resources, (c) ensure executor.shutdown is only called if executor exists (as currently done), and (d) add error handling/logging around each cleanup step so failures don't prevent other resources from being freed.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)

163-198: Track VideoForwarder ownership to avoid stopping shared instances.

When watch_video_track receives a shared_forwarder, the code stores it in self._video_forwarder but doesn't distinguish ownership. Later, _stop_watching_video_track (lines 332-337) unconditionally calls stop() on any forwarder, which would terminate the shared instance for all other consumers.

Apply this diff to track ownership:

         self._frame_buffer: LatestNQueue[av.VideoFrame] = LatestNQueue(maxlen=10)
         self._latest_frame: Optional[av.VideoFrame] = None
         self._video_forwarder: Optional[VideoForwarder] = None
+        self._owns_video_forwarder = False
         self._stt_subscription_setup = False

         if shared_forwarder is not None:
             self._video_forwarder = shared_forwarder
+            self._owns_video_forwarder = False
             logger.info("🎥 Moondream Local VLM subscribing to shared VideoForwarder")

         else:
             self._video_forwarder = VideoForwarder(
                 track,  # type: ignore[arg-type]
                 max_buffer=10,
                 fps=1.0,
                 name="moondream_local_vlm_forwarder",
             )
+            self._owns_video_forwarder = True
             await self._video_forwarder.start()

     async def _stop_watching_video_track(self) -> None:
         """Stop video forwarding."""
         if self._video_forwarder is not None:
-            await self._video_forwarder.stop()
+            if self._owns_video_forwarder:
+                await self._video_forwarder.stop()
+            else:
+                logger.debug("Shared VideoForwarder left running; owner controls lifecycle")
             self._video_forwarder = None
             logger.info("Stopped video forwarding")

🧹 Nitpick comments (11)

plugins/moondream/tests/test_moondream_local_vlm.py (5)
69-69: Remove redundant warmup call.

The local_vlm_vqa fixture already calls warmup() at line 43, so this second call is unnecessary.

Apply this diff:
-    await local_vlm_vqa.warmup()
     assert local_vlm_vqa.model is not None, "Model must be loaded before test"
72-72: Consider using the public API instead of private members.

Directly manipulating _latest_frame bypasses the intended API. For a more realistic test, consider setting up a proper video track flow or adding a public method to inject frames for testing.

90-90: Remove redundant warmup call.

The local_vlm_caption fixture already calls warmup() at line 58, making this call unnecessary.

Apply this diff:
-    await local_vlm_caption.warmup()
     assert local_vlm_caption.model is not None, "Model must be loaded before test"
93-93: Consider using the public API instead of private members.

Directly setting _latest_frame circumvents the intended interface. For more robust testing, consider using the public video track API or adding a test-specific frame injection method.

99-102: Simplify redundant assertion.

Line 99 already checks len(response.text) > 0, so the additional check at line 102 for len(response.text.strip()) > 0 is redundant. If you want to verify non-whitespace content, replace line 99 instead.

Apply this diff:
     assert response.text is not None
-    assert len(response.text) > 0
     assert response.exception is None
     
-    assert len(response.text.strip()) > 0
+    assert len(response.text.strip()) > 0, "Response should contain non-whitespace text"
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
149-156: Potentially redundant .to(self.device) call.

Line 156 calls .to(self.device) after from_pretrained already specifies device_map={"": self.device}. The device_map parameter should handle device placement, making the explicit .to() call redundant.

If device_map is working correctly, apply this diff:
         model = AutoModelForCausalLM.from_pretrained(
             self.model_name,
             device_map={"": self.device},
             dtype=self._dtype,
             trust_remote_code=True,
             cache_dir=self.options.model_dir,
             **load_kwargs,
-        ).to(self.device)
+        )
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (5)
104-106: Consider using self.executor for consistency.

While asyncio.to_thread works, using self.executor provides better control over threading behavior and is more consistent with the processor pattern used elsewhere.

Based on learnings

Apply this diff:
-        self.model = await asyncio.to_thread(  # type: ignore[func-returns-value]
-            lambda: self._load_model_sync()
-        )
+        loop = asyncio.get_event_loop()
+        self.model = await loop.run_in_executor(
+            self.executor, self._load_model_sync
+        )
259-259: Consider using self.executor for consistency.

Using asyncio.to_thread here works, but self.executor would provide better control and consistency with the established pattern.

Based on learnings

Apply this diff:
-                result = await asyncio.to_thread(self.model.query, image, text, stream=True)
+                loop = asyncio.get_event_loop()
+                result = await loop.run_in_executor(
+                    self.executor, self.model.query, image, text, True
+                )
Note: The stream=True keyword argument needs to be passed positionally or wrapped in a lambda if the API doesn't support it directly via run_in_executor.

266-266: Consider using self.executor for consistency.

Same issue as line 259—using self.executor provides better threading control.

Based on learnings

Apply this diff:
-                answer = await asyncio.to_thread(self._consume_stream, stream)
+                loop = asyncio.get_event_loop()
+                answer = await loop.run_in_executor(
+                    self.executor, self._consume_stream, stream
+                )
278-278: Consider using self.executor for consistency.

Same pattern as lines 259 and 266—prefer self.executor for better control.

Based on learnings

Apply this diff:
-                result = await asyncio.to_thread(self.model.caption, image, length="normal", stream=True)
+                loop = asyncio.get_event_loop()
+                result = await loop.run_in_executor(
+                    self.executor, self.model.caption, image, "normal", True
+                )
Note: Positional arguments for length and stream since keyword arguments may not work with run_in_executor.

285-285: Consider using self.executor for consistency.

Final instance of asyncio.to_thread—same recommendation as previous occurrences.

Based on learnings

Apply this diff:
-                caption = await asyncio.to_thread(self._consume_stream, stream)
+                loop = asyncio.get_event_loop()
+                caption = await loop.run_in_executor(
+                    self.executor, self._consume_stream, stream
+                )

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ec534fe and fa9847d.

📒 Files selected for processing (4)

plugins/moondream/tests/test_moondream_local_vlm.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (7 hunks)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py
plugins/moondream/tests/test_moondream_local_vlm.py
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py

🧬 Code graph analysis (4)

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)

plugins/moondream/tests/test_moondream_local.py (3)

is_available (188-189)

is_available (216-217)

is_available (244-245)

plugins/moondream/tests/test_moondream_local_vlm.py (1)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (4)

LocalVLM (32-345)

warmup (94-97)

close (339-345)

simple_response (309-330)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (8)

agents-core/vision_agents/core/agents/agents.py (2)

AgentOptions (93-103)

default_agent_options (110-111)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (16-47)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (437-458)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

agents-core/vision_agents/core/utils/queue.py (2)

LatestNQueue (6-28)

put_latest_nowait (22-28)

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)

handle_device (7-11)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)

_load_model_sync (129-183)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (3)

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (3)

parse_detection_bbox (13-31)

annotate_detections (48-111)

handle_device (7-11)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)

MoondreamVideoTrack (16-76)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (4)

process_video (101-143)

_process_and_add_frame (205-234)

_run_inference (163-175)

_run_detection_sync (177-203)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Test "not integration"

🔇 Additional comments (19)

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)

7-11: LGTM!

The device selection logic is clean and appropriate. CUDA with float16 for GPU acceleration, CPU with float32 as the fallback—straightforward and correct.

plugins/moondream/tests/test_moondream_local_vlm.py (3)

20-25: LGTM!

The fixture correctly loads and converts the test image to RGB format.

28-31: LGTM!

Clean conversion from PIL Image to av.VideoFrame.

34-46: LGTM!

The fixture properly initializes the VLM, warms it up, and ensures cleanup with the finally block.

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (6)

22-23: LGTM!

Import updates correctly reference the new handle_device utility and the video track module.

55-81: LGTM!

The force_cpu parameter is properly integrated and the device selection logic correctly delegates to handle_device() when not forcing CPU.

162-166: LGTM!

The guarded compilation with fallback is a solid defensive pattern for platform compatibility.

185-228: LGTM!

The video processing setup with lazy model loading and VideoForwarder sharing is well-structured and consistent with the cloud processor pattern.

234-278: LGTM!

The inference flow properly offloads synchronous model operations to the executor, maintaining responsiveness in the async event loop.

280-301: LGTM!

Frame processing with conditional annotation and robust error handling looks solid.

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (9)

72-74: MPS override is silent but acceptable.

The code detects MPS and forces CPU mode with a warning. While this silently overrides the user's device choice, it's justified by the model's incompatibility. The warning message is clear.

76-79: LGTM!

Device selection logic is consistent with the detection processor and properly uses the handle_device() utility.

109-161: LGTM!

Model loading with proper authentication handling, device mapping, and guarded compilation. The implementation is solid.

200-206: LGTM!

Frame buffering with proper error handling is straightforward and correct.

208-216: LGTM!

STT subscription setup follows the standard event handling pattern.

217-230: LGTM!

Stream consumption with type checking and defensive handling of unexpected chunk types is well-implemented.

302-307: LGTM!

STT transcript handler correctly delegates to frame processing.

309-330: LGTM!

The simple_response API correctly handles both VQA and caption modes with appropriate error handling.

339-345: LGTM!

Resource cleanup is properly implemented with executor shutdown and model deletion.

coderabbitai · 2025-11-06T22:43:29Z

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py

+
+    def __init__(
+            self,
+            mode: Literal["vqa", "caption"] = "vqa",


⚠️ Potential issue | 🟠 Major

Validate mode parameter at initialization.

The mode parameter accepts Literal["vqa", "caption"] but there's no runtime validation. Invalid values could cause issues downstream in _process_frame.

Based on learnings

Apply this diff to add validation:

def __init__( self, mode: Literal["vqa", "caption"] = "vqa", max_workers: int = 10, force_cpu: bool = False, model_name: str = "moondream/moondream3-preview", options: Optional[AgentOptions] = None, ): super().__init__() + + if mode not in ("vqa", "caption"): + raise ValueError(f"mode must be 'vqa' or 'caption', got: {mode}") self.max_workers = max_workers self.mode = mode

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py around line 55, the mode parameter is typed as Literal["vqa","caption"] but not validated at runtime; add a validation in the class initializer (after assigning self.mode) that checks if self.mode is one of {"vqa", "caption"} and if not raise a ValueError with a clear message (e.g., "Invalid mode: {self.mode}. Expected 'vqa' or 'caption'") so invalid values are caught early before _process_frame is called.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)

40-53: Update the docstring to reflect the new API.

Lines 48-49 document a device parameter that no longer exists in the constructor signature. The API now uses force_cpu instead. This inconsistency will confuse users.

Apply this diff:

         detect_objects: Object(s) to detect. Moondream uses zero-shot detection,
                        so any object string works. Examples: "person", "car",
                        "basketball", ["person", "car", "dog"]. Default: "person"
         fps: Frame processing rate
         interval: Processing interval in seconds
         max_workers: Number of worker threads
-        device: Device to run inference on ('cuda', 'mps', or 'cpu'). 
-               Auto-detects CUDA, then MPS (Apple Silicon), then defaults to CPU.
+        force_cpu: If True, forces CPU device regardless of hardware availability.
+                   Otherwise auto-detects the best available device (CUDA > CPU).
+                   Note: MPS is automatically converted to CPU as Moondream doesn't support MPS.
         model_name: Hugging Face model identifier (default: "moondream/moondream3-preview")
         options: AgentOptions for model directory configuration. If not provided,
                  uses default_agent_options() which defaults to tempfile.gettempdir()

♻️ Duplicate comments (3)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (3)

52-64: Add runtime validation for the mode parameter.

The mode parameter lacks runtime validation. While type hints document the expected values, they don't prevent invalid strings at runtime, which would cause issues downstream in _process_frame.

Apply this diff to add validation:

     def __init__(
             self,
             mode: Literal["vqa", "caption"] = "vqa",
             max_workers: int = 10,
             force_cpu: bool = False,
             model_name: str = "moondream/moondream3-preview",
             options: Optional[AgentOptions] = None,
     ):
         super().__init__()
+        
+        if mode not in ("vqa", "caption"):
+            raise ValueError(f"mode must be 'vqa' or 'caption', got: {mode}")
 
         self.max_workers = max_workers
         self.mode = mode

166-202: Track VideoForwarder ownership to avoid terminating shared resources.

When watch_video_track receives a shared_forwarder, the current implementation doesn't track ownership. Later, _stop_watching_video_track (line 338) unconditionally calls stop() on the forwarder, which terminates the shared resource for all other consumers.

Apply this diff to track ownership:

     def __init__(
             ...
     ):
         ...
         self._video_forwarder: Optional[VideoForwarder] = None
+        self._owns_video_forwarder = False
         self._stt_subscription_setup = False
         ...
         
     async def watch_video_track(
             self,
             track: aiortc.mediastreams.MediaStreamTrack,
             shared_forwarder: Optional[VideoForwarder] = None
     ) -> None:
         ...
         if shared_forwarder is not None:
             self._video_forwarder = shared_forwarder
+            self._owns_video_forwarder = False
             logger.info("🎥 Moondream Local VLM subscribing to shared VideoForwarder")
             ...
         else:
             self._video_forwarder = VideoForwarder(
                 track,  # type: ignore[arg-type]
                 max_buffer=10,
                 fps=1.0,
                 name="moondream_local_vlm_forwarder",
             )
+            self._owns_video_forwarder = True
             ...

335-340: Only stop owned VideoForwarder instances.

This method unconditionally stops the forwarder, but should only stop forwarders that LocalVLM created, not borrowed ones.

Apply this diff:

     async def _stop_watching_video_track(self) -> None:
         """Stop video forwarding."""
         if self._video_forwarder is not None:
-            await self._video_forwarder.stop()
+            if self._owns_video_forwarder:
+                await self._video_forwarder.stop()
+            else:
+                logger.debug("Shared VideoForwarder left running; owner controls lifecycle")
             self._video_forwarder = None
             logger.info("Stopped video forwarding")

🧹 Nitpick comments (3)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)
104-106: Consider consistency in thread pool usage.

While asyncio.to_thread does use a thread pool internally, it uses Python's default thread pool rather than the self.executor created at line 87. This means the max_workers parameter has no effect on these operations.

If the intent is to control concurrency with max_workers, use the configured executor:
-        self.model = await asyncio.to_thread(  # type: ignore[func-returns-value]
-            lambda: self._load_model_sync()
-        )
+        loop = asyncio.get_event_loop()
+        self.model = await loop.run_in_executor(
+            self.executor,
+            self._load_model_sync
+        )
The same pattern applies to lines 262, 269, 281, and 288. If the current approach is intentional (using the default pool), consider removing the unused self.executor to clarify the design.

342-348: Consider waiting for executor tasks during shutdown.

Using wait=False in executor.shutdown() may leave tasks running when the model is deleted, potentially causing errors during cleanup.

Consider waiting for tasks to complete gracefully:
     def close(self):
         """Clean up resources."""
-        self.executor.shutdown(wait=False)
+        self.executor.shutdown(wait=True)
         if self.model is not None:
             del self.model
             self.model = None
         logger.info("🛑 Moondream Local VLM closed")
Alternatively, if immediate shutdown is required, consider using a brief timeout to balance responsiveness with graceful completion.
plugins/moondream/tests/test_moondream_local.py (1)
263-269: Remove redundant test code.

The test at line 264 uses force_cpu=True, which doesn't verify MPS-to-CPU conversion—it bypasses the auto-detection logic entirely. The comment claims it tests "explicit MPS parameter," but the old API no longer exists. This segment duplicates what line 256 already verifies.

Apply this diff:
-        # Also test explicit MPS parameter
-        processor2 = LocalDetectionProcessor(force_cpu=True)
-        try:
-            # Verify explicit MPS is also converted to CPU
-            assert processor2.device == "cpu"
-        finally:
-            processor2.close()

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fa9847d and f1ba327.

📒 Files selected for processing (3)

plugins/moondream/tests/test_moondream_local.py (4 hunks)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (7 hunks)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/moondream/tests/test_moondream_local.py
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py

🔇 Additional comments (9)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)

235-303: Excellent async lock handling and error management.

The non-blocking lock acquisition with timeout (lines 244-249) elegantly prevents concurrent processing without blocking callers. The frame processing logic properly handles both VQA and caption modes with good error handling and event emission.

109-164: Well-structured model loading with helpful error guidance.

The authentication error handling is particularly helpful, providing clear instructions for users encountering gated model access. The fallback when compilation fails (lines 143-147) ensures robustness.

plugins/moondream/tests/test_moondream_local.py (3)

42-48: LGTM: Test fixture correctly uses force_cpu=True.

This ensures deterministic behavior in CI environments and avoids hardware dependencies during testing.

271-277: LGTM: Explicit CPU test is correct.

Properly validates the force_cpu=True flag behavior.

279-289: LGTM: CUDA auto-detection test is correct.

Properly validates that the default constructor selects CUDA when available.

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (4)

78-81: Device resolution logic is clean and well-structured.

The conditional handling of force_cpu provides a clear override path while delegating auto-detection to the centralized handle_device() utility.

161-167: LGTM: Non-fatal compilation fallback is good defensive programming.

The try-except pattern ensures that compilation failures don't prevent model usage, with appropriate warning logs for debugging.

22-23: Verify the handle_device utility exists and returns the expected types.

The code imports handle_device from moondream_utils and expects it to return a tuple of (device, dtype). Ensure this function is properly defined and its return types match the usage at line 81.

55-65: Breaking API change: Verify all callers are updated.

The constructor signature changed from accepting a device string parameter to a force_cpu boolean flag. This is a breaking change for any existing code using the old API.

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (8)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (7)

104-106: Prefer self.executor over asyncio.to_thread for consistency.

Line 104 uses asyncio.to_thread, but the class maintains self.executor (Line 87). For consistency and explicit thread pool management, use self.executor as suggested in past reviews.

Based on learnings

Apply this diff:

-        self.model = await asyncio.to_thread(  # type: ignore[func-returns-value]
-            lambda: self._load_model_sync()
-        )
+        loop = asyncio.get_event_loop()
+        self.model = await loop.run_in_executor(self.executor, self._load_model_sync)

263-263: Prefer self.executor over asyncio.to_thread for VQA query.

Line 263 uses asyncio.to_thread for model inference. Use self.executor consistently as suggested in past reviews.

Based on learnings

Apply this diff:

-                result = await asyncio.to_thread(self.model.query, image, text, stream=True)
+                loop = asyncio.get_event_loop()
+                result = await loop.run_in_executor(self.executor, self.model.query, image, text, stream=True)

270-270: Prefer self.executor for stream consumption.

Line 270 uses asyncio.to_thread. Use self.executor for consistency.

Based on learnings

Apply this diff:

-                answer = await asyncio.to_thread(self._consume_stream, stream)
+                loop = asyncio.get_event_loop()
+                answer = await loop.run_in_executor(self.executor, self._consume_stream, stream)

282-289: Prefer self.executor for caption inference and stream consumption.

Lines 282 and 289 use asyncio.to_thread. Use self.executor for consistency.

Based on learnings

Apply this diff:

-                result = await asyncio.to_thread(self.model.caption, image, length="normal", stream=True)
+                loop = asyncio.get_event_loop()
+                result = await loop.run_in_executor(self.executor, self.model.caption, image, length="normal", stream=True)
 
                 if isinstance(result, dict) and "caption" in result:
                     stream = result["caption"]
                 else:
                     stream = result
 
-                caption = await asyncio.to_thread(self._consume_stream, stream)
+                caption = await loop.run_in_executor(self.executor, self._consume_stream, stream)

52-79: Add runtime validation for the mode parameter.

The mode parameter accepts Literal["vqa", "caption"] but lacks runtime validation. Invalid values passed at runtime (e.g., from configuration files) would cause issues in _process_frame at Line 258.

Based on learnings

Apply this diff:

     def __init__(
             self,
             mode: Literal["vqa", "caption"] = "vqa",
             max_workers: int = 10,
             force_cpu: bool = False,
             model_name: str = "moondream/moondream3-preview",
             options: Optional[AgentOptions] = None,
     ):
         super().__init__()
+        
+        if mode not in ("vqa", "caption"):
+            raise ValueError(f"mode must be 'vqa' or 'caption', got: {mode}")
 
         self.max_workers = max_workers
         self.mode = mode

166-197: Don't stop a borrowed VideoForwarder—track ownership.

Line 340 in _stop_watching_video_track calls stop() on self._video_forwarder regardless of whether it's shared. Stopping a shared forwarder terminates video for all consumers. Track ownership and only stop forwarders created by this instance.

Based on learnings

Apply this diff:

         self._video_forwarder: Optional[VideoForwarder] = None
+        self._owns_video_forwarder = False
         self._stt_subscription_setup = False
@@
         if shared_forwarder is not None:
             self._video_forwarder = shared_forwarder
+            self._owns_video_forwarder = False
             logger.info("🎥 Moondream Local VLM subscribing to shared VideoForwarder")
@@
         else:
             self._video_forwarder = VideoForwarder(
                 track,  # type: ignore[arg-type]
                 max_buffer=10,
                 fps=1.0,
                 name="moondream_local_vlm_forwarder",
             )
+            self._owns_video_forwarder = True
             await self._video_forwarder.start()

And update _stop_watching_video_track:

     async def _stop_watching_video_track(self) -> None:
         """Stop video forwarding."""
         if self._video_forwarder is not None:
-            await self._video_forwarder.stop()
+            if self._owns_video_forwarder:
+                await self._video_forwarder.stop()
+            else:
+                logger.debug("Shared VideoForwarder left running; owner controls lifecycle")
             self._video_forwarder = None
             logger.info("Stopped video forwarding")

344-350: Guard executor shutdown to prevent exceptions.

Line 346 calls self.executor.shutdown(wait=False) without checking if the executor exists. If close() is called before initialization completes or multiple times, this could raise an AttributeError.

Based on learnings

Apply this diff:

     def close(self):
         """Clean up resources."""
-        self.executor.shutdown(wait=False)
+        if self.executor is not None:
+            self.executor.shutdown(wait=False)
         if self.model is not None:
             del self.model
             self.model = None

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)

154-161: Redundant device placement remains—Lines 156 and 161 conflict.

Line 156 passes device_map={"": self._device} (a torch.device object), then Line 161 calls .to(self._device). This redundancy still exists:

device_map expects string identifiers ("cuda", "cpu"), not torch.device objects

When device_map is set, the model is already placed—calling .to() is unnecessary

Apply this diff:
             model = AutoModelForCausalLM.from_pretrained(
                 self.model_name,
-                device_map={"": self._device},
-                dtype=self._dtype,
                 trust_remote_code=True,
                 cache_dir=self.options.model_dir,
                 **load_kwargs,
-            ).to(self._device)  # type: ignore[arg-type]
+            )
+            
+            # Place model on device with specified dtype
+            model = model.to(device=self._device, dtype=self._dtype)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f9c91e9 and a801788.

📒 Files selected for processing (4)

agents-core/vision_agents/core/agents/agents.py (2 hunks)
plugins/moondream/example/moondream_vlm_example.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (7 hunks)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

plugins/moondream/example/moondream_vlm_example.py
agents-core/vision_agents/core/agents/agents.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py

🧬 Code graph analysis (2)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (3)

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (3)

parse_detection_bbox (13-31)

annotate_detections (48-111)

handle_device (7-11)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)

MoondreamVideoTrack (16-76)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (7)

agents-core/vision_agents/core/agents/agents.py (2)

AgentOptions (92-102)

default_agent_options (109-110)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (16-47)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (437-458)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

agents-core/vision_agents/core/utils/queue.py (2)

LatestNQueue (6-28)

put_latest_nowait (22-28)

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)

handle_device (7-11)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"

🔇 Additional comments (5)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (4)

22-23: LGTM! Import updates align with new device handling.

The addition of handle_device and the updated MoondreamVideoTrack import path are appropriate for the refactored device management.

55-81: Device handling logic is clean and well-structured.

The force_cpu parameter provides explicit CPU override while handle_device() manages automatic device selection. The fallback to CPU when forcing is clear.

113-116: Device property provides clean public access.

Exposing device type as a string property is a good ergonomic choice for logging and debugging.

190-232: LGTM! VideoForwarder integration follows the shared forwarder pattern correctly.

The logic properly handles both shared and owned forwarder cases, subscribing at the specified fps when shared and creating its own otherwise. The consumer callback setup is appropriate.

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)

72-79: LGTM! Device handling properly manages MPS incompatibility.

The MPS detection with CPU fallback and clear warning is appropriate. The force_cpu and handle_device() flow mirrors the detection processor correctly.

coderabbitai · 2025-11-07T00:54:12Z

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py

+            model = AutoModelForCausalLM.from_pretrained(
+                self.model_name,
+                device_map={"": self.device},
+                dtype=self._dtype,
+                **load_kwargs,
+            )


⚠️ Potential issue | 🟠 Major

Redundant device placement—device_map with .to() on Line 139.

Line 133 sets device_map={"": self.device} (a torch.device object), and Line 139 calls .to("cpu") conditionally. The device_map expects string identifiers, not torch.device objects, and the conditional .to() adds complexity.

Apply this diff:

model = AutoModelForCausalLM.from_pretrained( self.model_name, - device_map={"": self.device}, - dtype=self._dtype, **load_kwargs, ) - if self.force_cpu: - model.to("cpu") # type: ignore[arg-type] + # Place model on device with specified dtype + model = model.to(device=self.device, dtype=self._dtype) model.eval()

🤖 Prompt for AI Agents

In plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py around lines 131 to 136, remove the device_map argument passed to AutoModelForCausalLM.from_pretrained (device_map={"": self.device}) because device_map expects string identifiers and we already perform explicit placement with .to(...) later; call from_pretrained with self.model_name, dtype=self._dtype and **load_kwargs only, and rely on the subsequent .to(...) call to move the model to the desired device.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (4)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)

154-161: Remove redundant device placement.

Lines 156 and 161 specify device placement twice: device_map={"": self._device} expects string identifiers but receives a torch.device object, and the subsequent .to(self._device) is redundant when device_map is set.

Apply this diff:

             model = AutoModelForCausalLM.from_pretrained(
                 self.model_name,
-                device_map={"": self._device},
-                dtype=self._dtype,
                 trust_remote_code=True,
                 cache_dir=self.options.model_dir,
                 **load_kwargs,
-            ).to(self._device)
+            )
+            
+            # Move model to target device with specified dtype
+            model = model.to(device=self._device, dtype=self._dtype)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (3)

54-65: Validate mode parameter at initialization.

The mode parameter accepts Literal["vqa", "caption"] but lacks runtime validation. Invalid values could cause issues downstream in _process_frame.

Apply this diff:

     def __init__(
             self,
             mode: Literal["vqa", "caption"] = "vqa",
             max_workers: int = 10,
             force_cpu: bool = False,
             model_name: str = "moondream/moondream3-preview",
             options: Optional[AgentOptions] = None,
     ):
         super().__init__()
+        
+        if mode not in ("vqa", "caption"):
+            raise ValueError(f"mode must be 'vqa' or 'caption', got: {mode}")
 
         self.max_workers = max_workers
         self.mode = mode

133-142: Remove redundant device placement.

Line 135 sets device_map={"": self.device} and line 141 conditionally calls .to("cpu"). When device_map is specified, the model is already placed; the subsequent .to() is unnecessary.

Apply this diff:

             model = AutoModelForCausalLM.from_pretrained(
                 self.model_name,
-                device_map={"": self.device},
-                dtype=self._dtype,
                 **load_kwargs,
             )
 
-            if self.force_cpu:
-                model.to("cpu")  # type: ignore[arg-type]
+            # Place model on device with specified dtype
+            model = model.to(device=self.device, dtype=self._dtype)
             model.eval()

168-199: Track VideoForwarder ownership to avoid terminating borrowed forwarders.

When watch_video_track is called with a shared_forwarder, line 182 borrows it, but _stop_watching_video_track (lines 339-344) stops it unconditionally. This shuts down the shared forwarder for every other consumer.

Apply this diff:

         self._frame_buffer: LatestNQueue[av.VideoFrame] = LatestNQueue(maxlen=10)
         self._latest_frame: Optional[av.VideoFrame] = None
         self._video_forwarder: Optional[VideoForwarder] = None
+        self._owns_video_forwarder = False
         self._stt_subscription_setup = False
@@
         if shared_forwarder is not None:
             self._video_forwarder = shared_forwarder
+            self._owns_video_forwarder = False
             logger.info("🎥 Moondream Local VLM subscribing to shared VideoForwarder")
@@
         else:
             self._video_forwarder = VideoForwarder(
                 track,  # type: ignore[arg-type]
                 max_buffer=10,
                 fps=1.0,
                 name="moondream_local_vlm_forwarder",
             )
+            self._owns_video_forwarder = True
             await self._video_forwarder.start()
@@
     async def _stop_watching_video_track(self) -> None:
         """Stop video forwarding."""
         if self._video_forwarder is not None:
-            await self._video_forwarder.stop()
+            if self._owns_video_forwarder:
+                await self._video_forwarder.stop()
+            else:
+                logger.debug("Shared VideoForwarder left running; owner controls lifecycle")
             self._video_forwarder = None

🧹 Nitpick comments (4)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)
244-252: Consider simplifying lock acquisition.

Lines 249-252 manually acquire the lock with a try-except, then release it in a finally block (lines 306-307). This pattern is more complex than necessary—the initial lock check at line 244 already guards against concurrent access.

Consider using the lock as a context manager throughout:
-        try:
-            await self._processing_lock.acquire()
-        except Exception as e:
-            logger.warning(f"Failed to acquire lock: {e}")
-            return None
-
-        latest_frame = self._latest_frame
-
-        try:
+        async with self._processing_lock:
+            latest_frame = self._latest_frame
+            
             frame_array = latest_frame.to_ndarray(format="rgb24")
             ...
-        except Exception as e:
-            logger.exception(f"Error processing frame: {e}")
-            return LLMResponseEvent(original=None, text="", exception=e)
-        finally:
-            if self._processing_lock.locked():
-                self._processing_lock.release()
+            except Exception as e:
+                logger.exception(f"Error processing frame: {e}")
+                return LLMResponseEvent(original=None, text="", exception=e)
260-300: Use the configured executor for thread pool management.

Lines 265, 272, 284, and 291 use asyncio.to_thread(), which creates ad-hoc threads. The class already initializes self.executor (a ThreadPoolExecutor at line 89) for this purpose. Using the executor provides better resource pooling.

Apply this pattern throughout:
-                result = await asyncio.to_thread(self.model.query, image, text, stream=True)
+                loop = asyncio.get_event_loop()
+                result = await loop.run_in_executor(self.executor, self.model.query, image, text, True)
                
                 if isinstance(result, dict) and "answer" in result:
                     stream = result["answer"]
                 else:
                     stream = result
 
-                answer = await asyncio.to_thread(self._consume_stream, stream)
+                answer = await loop.run_in_executor(self.executor, self._consume_stream, stream)
(Apply similar changes to the caption mode at lines 284 and 291.)
plugins/moondream/example/README.md (1)

1-2: Consider adding a comma for clarity.

Static analysis suggests: "Moondream example**,** Please see root readme for details."
plugins/moondream/README.md (1)
168-168: Format bare URL as Markdown link.

Static analysis flags the bare URL at line 168. Consider formatting it as a proper Markdown link for better readability:
-- Request access at https://huggingface.co/moondream/moondream3-preview
+- Request access at [huggingface.co/moondream/moondream3-preview](https://huggingface.co/moondream/moondream3-preview)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a801788 and 0de1cdd.

📒 Files selected for processing (8)

plugins/moondream/README.md (5 hunks)
plugins/moondream/example/README.md (1 hunks)
plugins/moondream/example/moondream_vlm_example.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (2 hunks)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (7 hunks)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1 hunks)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py

🚧 Files skipped from review as they are similar to previous changes (1)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/moondream/example/moondream_vlm_example.py
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py

🧬 Code graph analysis (4)

plugins/moondream/example/moondream_vlm_example.py (3)

agents-core/vision_agents/core/agents/agents.py (7)

Agent (125-1355)

create_user (682-694)

create_call (696-701)

subscribe (307-319)

simple_response (292-305)

join (471-554)

finish (556-589)

agents-core/vision_agents/core/agents/agent_launcher.py (1)

AgentLauncher (18-125)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)

CloudVLM (27-249)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (8)

agents-core/vision_agents/core/agents/agents.py (6)

AgentOptions (92-102)

default_agent_options (109-110)

subscribe (307-319)

join (471-554)

simple_response (292-305)

close (591-663)

agents-core/vision_agents/core/stt/events.py (1)

STTTranscriptEvent (16-47)

agents-core/vision_agents/core/llm/events.py (2)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

agents-core/vision_agents/core/llm/llm.py (2)

LLMResponseEvent (38-42)

VideoLLM (437-458)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

agents-core/vision_agents/core/utils/queue.py (2)

LatestNQueue (6-28)

put_latest_nowait (22-28)

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)

handle_device (7-11)

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (9)

watch_video_track (66-101)

_on_frame_received (103-109)

_setup_stt_subscription (111-118)

on_stt_transcript (117-118)

_on_stt_transcript (193-198)

_consume_stream (120-133)

_process_frame (135-191)

simple_response (200-221)

close (244-249)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (3)

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (3)

parse_detection_bbox (13-31)

annotate_detections (48-111)

handle_device (7-11)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)

MoondreamVideoTrack (16-79)

agents-core/vision_agents/core/utils/video_forwarder.py (2)

VideoForwarder (14-195)

start_event_consumer (109-195)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (1)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)

MoondreamVideoTrack (16-79)

🪛 LanguageTool

plugins/moondream/README.md

[uncategorized] ~8-~8: Possible missing comma found.
Context: ...s Choose between cloud-hosted or local processing depending on your needs. When running l...

(AI_HYDRA_LEO_MISSING_COMMA)

[uncategorized] ~165-~165: Possible missing article found.
Context: ... the model from HuggingFace and runs on device. It supports both VQA and captioning mo...

(AI_HYDRA_LEO_MISSING_THE)

[uncategorized] ~234-~234: Possible missing comma found.
Context: ...ry configuration. If not provided, uses default which defaults to tempfile.gettempdir()...

(AI_HYDRA_LEO_MISSING_COMMA)

[uncategorized] ~240-~240: Loose punctuation mark.
Context: ...e. ### CloudVLM Parameters - api_key: str - API key for Moondream Cloud API. ...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~241-~241: Loose punctuation mark.
Context: ..._API_KEYenvironment variable. -mode`: Literal["vqa", "caption"] - "vqa" for v...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~248-~248: Loose punctuation mark.
Context: ...mits. ### LocalVLM Parameters - mode: Literal["vqa", "caption"] - "vqa" for v...

(UNLIKELY_OPENING_PUNCTUATION)

plugins/moondream/example/README.md

[typographical] ~1-~1: Consider adding a comma here.
Context: ## Moondream example Please see root readme for details.

(PLEASE_COMMA)

🪛 markdownlint-cli2 (0.18.1)

plugins/moondream/README.md

168-168: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Test "not integration"

🔇 Additional comments (13)

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (2)

19-19: LGTM—Import path correctly updated.

The import path for MoondreamVideoTrack now reflects the module reorganization into the detection subpackage.

31-47: LGTM—Documentation improvements enhance clarity.

The updated docstring provides clearer guidance on rate limits and default values, making the API more discoverable.

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (3)

55-65: LGTM—Improved API with force_cpu parameter.

Replacing the device string parameter with a force_cpu boolean simplifies the API and enables automatic device detection, improving the user experience.

78-81: LGTM—Device selection logic is correct.

The conditional device selection properly handles both forced CPU mode and automatic detection.

113-116: LGTM—Clean device property for external access.

The property provides a string representation of the internal torch.device, maintaining a clean public interface.

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (2)

74-76: LGTM—MPS incompatibility handled gracefully.

The automatic CPU fallback for MPS devices is appropriate given the model's CUDA dependencies, and the warning message clearly explains the behavior.

346-352: LGTM—Resource cleanup is thorough.

The close() method properly shuts down the executor and cleans up the model reference.

plugins/moondream/example/moondream_vlm_example.py (3)

15-29: LGTM—Agent creation is straightforward.

The create_agent function properly constructs an agent with CloudVLM, handling the API key via environment variable.

32-49: LGTM—Call joining and event handling are correct.

The join_call function properly sets up the call, subscribes to participant events, and triggers the agent response. The 2-second delay before prompting is a reasonable grace period.

52-53: LGTM—CLI integration follows standard pattern.

The main entry point correctly wires the launcher with the agent creation and call-joining functions.

plugins/moondream/README.md (3)

12-14: LGTM—Installation command uses standard extras syntax.

The updated installation command vision-agents[moondream] follows Python packaging conventions for optional dependencies.

111-210: LGTM—VLM examples are comprehensive and clear.

The Quick Start section provides complete, runnable examples for both CloudVLM and LocalVLM, with clear explanations of the VQA and caption modes.

238-254: LGTM—Parameter documentation is thorough.

The CloudVLM and LocalVLM configuration sections clearly document all parameters with types, defaults, and usage guidance. The MPS-to-CPU conversion note is particularly helpful.

Nash0x7E2 added 5 commits November 5, 2025 13:47

Scaffolding setup for Moondream VLM

d681edd

Basic (broken) impl

67a8529

Fix parsing

45509da

Add some handling around processing

722662d

Basic Moondream VLM example

a184c78

Nash0x7E2 requested a review from dangusev November 5, 2025 23:37

Nash0x7E2 self-assigned this Nov 5, 2025

github-actions bot added plugins config docs project-info labels Nov 5, 2025

Remove extra character

a9a092e

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

Clean up folder structure

f838a1e

github-actions bot added the agents-core label Nov 6, 2025

WIP local version

a0b5c9d

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py Show resolved Hide resolved

Nash0x7E2 added 5 commits November 5, 2025 21:51

Fix broken track imports

e0b31d3

LocalVLM tests

eaddf22

Unused param

e32af63

Ensure processors are wramed up during launch

02fad43

Ruff and MyPy

a82e2e0

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes