You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feat: Add support for Moondream VLM functions (#154)
* Scaffolding setup for Moondream VLM
* Basic (broken) impl
* Fix parsing
* Add some handling around processing
* Basic Moondream VLM example
* Remove extra character
* Clean up folder structure
* WIP local version
* Fix broken track imports
* LocalVLM tests
* Unused param
* Ensure processors are wramed up during launch
* Ruff and MyPy
* PR review - CloudVLM
* Add missing debug log for processor warmup
* Improve local device detection
* Formatting and clean up
* More clean up
* Fix bug with processing lock
* Ruff and MyPy final checks
* Expose device for verification
* Simplify example
* Update public doc strings
* Update readme
* unused import
Copy file name to clipboardExpand all lines: plugins/moondream/README.md
+151-9Lines changed: 151 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,27 +1,48 @@
1
1
# Moondream Plugin
2
2
3
-
This plugin provides Moondream 3 detection capabilities for vision-agents, enabling real-time zero-shot object detection on video streams. Choose between cloud-hosted or local processing depending on your needs.
3
+
This plugin provides Moondream 3 vision capabilities for vision-agents, including:
4
+
-**Object Detection**: Real-time zero-shot object detection on video streams
5
+
-**Visual Question Answering (VQA)**: Answer questions about video frames
6
+
-**Image Captioning**: Generate descriptions of video frames
7
+
8
+
Choose between cloud-hosted or local processing depending on your needs. When running locally, we recommend you do so on CUDA enabled devices.
4
9
5
10
## Installation
6
11
7
12
```bash
8
-
uv add vision-agents-plugins-moondream
13
+
uv add vision-agents[moondream]
9
14
```
10
15
11
-
## Choosing the Right Processor
16
+
## Choosing the Right Component
17
+
18
+
### Detection Processors
12
19
13
-
### CloudDetectionProcessor (Recommended for Most Users)
20
+
####CloudDetectionProcessor (Recommended for Most Users)
14
21
-**Use when:** You want a simple setup with no infrastructure management
15
22
-**Pros:** No model download, no GPU required, automatic updates
16
23
-**Cons:** Requires API key, 2 RPS rate limit by default (can be increased)
The `CloudVLM` uses Moondream's hosted API for visual question answering and captioning. It automatically processes video frames and responds to questions asked via STT (Speech-to-Text).
116
+
117
+
```python
118
+
import asyncio
119
+
import os
120
+
from dotenv import load_dotenv
121
+
from vision_agents.core import User, Agent, cli
122
+
from vision_agents.core.agents import AgentLauncher
123
+
from vision_agents.plugins import deepgram, getstream, elevenlabs, moondream
124
+
from vision_agents.core.events import CallSessionParticipantJoinedEvent
125
+
126
+
load_dotenv()
127
+
128
+
asyncdefcreate_agent(**kwargs) -> Agent:
129
+
# Create a cloud VLM for visual question answering
130
+
llm = moondream.CloudVLM(
131
+
api_key=os.getenv("MOONDREAM_API_KEY"), # or set MOONDREAM_API_KEY env var
132
+
mode="vqa", # or "caption" for image captioning
133
+
)
134
+
135
+
agent = Agent(
136
+
edge=getstream.Edge(),
137
+
agent_user=User(name="My happy AI friend", id="agent"),
-`interval`: int - Processing interval in seconds (default: 0)
109
231
-`max_workers`: int - Thread pool size for CPU-intensive operations (default: 10)
110
-
-`device`: str - Device to run inference on ('cuda', 'mps', or 'cpu'). Auto-detects CUDA, then MPS (Apple Silicon), then defaults to CPU. Default: `None` (auto-detect)
232
+
-`force_cpu`: bool - If True, force CPU usage even if CUDA/MPS is available. Auto-detects CUDA, then MPS (Apple Silicon), then defaults to CPU. We recommend running on CUDA for best performance. (default: False)
111
233
-`model_name`: str - Hugging Face model identifier (default: "moondream/moondream3-preview")
112
234
-`options`: AgentOptions - Model directory configuration. If not provided, uses default which defaults to tempfile.gettempdir()
113
235
114
236
**Performance:** Performance will vary depending on your hardware configuration. CUDA is recommended for best performance on NVIDIA GPUs. The model will be downloaded from HuggingFace on first use.
115
237
238
+
### CloudVLM Parameters
239
+
240
+
-`api_key`: str - API key for Moondream Cloud API. If not provided, will attempt to read from `MOONDREAM_API_KEY` environment variable.
241
+
-`mode`: Literal["vqa", "caption"] - "vqa" for visual question answering or "caption" for image captioning (default: "vqa")
242
+
-`max_workers`: int - Thread pool size for CPU-intensive operations (default: 10)
243
+
244
+
**Rate Limits:** By default, the Moondream Cloud API has rate limits. Contact the Moondream team to request higher limits.
245
+
246
+
### LocalVLM Parameters
247
+
248
+
-`mode`: Literal["vqa", "caption"] - "vqa" for visual question answering or "caption" for image captioning (default: "vqa")
249
+
-`max_workers`: int - Thread pool size for async operations (default: 10)
250
+
-`force_cpu`: bool - If True, force CPU usage even if CUDA/MPS is available. Auto-detects CUDA, then MPS (Apple Silicon), then defaults to CPU. Note: MPS is automatically converted to CPU due to model compatibility. We recommend running on CUDA for best performance. (default: False)
251
+
-`model_name`: str - Hugging Face model identifier (default: "moondream/moondream3-preview")
252
+
-`options`: AgentOptions - Model directory configuration. If not provided, uses default_agent_options()
253
+
254
+
**Performance:** Performance will vary depending on your hardware configuration. CUDA is recommended for best performance on NVIDIA GPUs. The model will be downloaded from HuggingFace on first use.
255
+
116
256
## Video Publishing
117
257
118
258
The processor publishes annotated video frames with bounding boxes drawn on detected objects:
0 commit comments