Skip to content

mcvella/viam-moondream-vision-modal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

moondream modular vision service

This module implements the rdk vision API in a mcvella:vision:moondream-modal model.

This model leverages the Moondream tiny vision language model to allow for image classification and querying - with inference running on the Modal platform, allowing you to augment your Viam machines with serverless cloud-based VLM capabilities.

Build and Run

To use this module, follow these instructions to add a module from the Viam Registry and select the mcvella:vision:moondream-modal model from the moondream-vision module.

You will also need to sign up for a Modal account, create a workspace, and then create an API token. The Modal API token ID and secret must then be used in your module configuration.

Configure Modal API Token

In the Viam app, you will need to configure access to your Modal account by setting environment variables for this module. To do so, in CONFIGURE, click on JSON, and within the service configuration for this module, add:

      "env": {
        "MODAL_TOKEN_ID": "YOURTOKENHERE",
        "MODAL_TOKEN_SECRET": "YOURSECRETHERE",
        "MODAL_GPU_TYPE": "L4"
      }

The MODAL_GPU_TYPE environment variable is optional and defaults to "L4" if not specified. You can change this to use different GPU types available on the Modal platform, such as "A10G" or "A100" for more computational power (and higher cost) or "T4" for lower-powered workloads.

Configure your vision service

Note

Before configuring your vision service, you must create a machine.

Navigate to the Config tab of your robot's page in the Viam app. Click on the Service subtab and click Create service. Select the vision type, then select the mcvella:vision:moondream-modal model. Enter a name for your vision service and click Create.

Note

For more information, see Configure a Robot.

Attributes

The following attributes are available for mcvella:vision:moondream-modal model:

Name Type Inclusion Description
default_question string optional For classifications, the default question to ask about the image. Defaults to "describe this image".
default_class string optional For detections, the default class to detect in the image. Defaults to "person".
gaze_detection boolean optional If set to true, detections will be gaze detections. Defaults to false.

API

The moondream resource provides the following methods from Viam's built-in rdk:service:vision API

get_classifications(image=binary, count)

get_classifications_from_camera(camera_name=string, count)

Note: if using this method, any cameras you are using must be set in the depends_on array for the service configuration, for example:

      "depends_on": [
        "cam"
      ]

By default, the Moondream model will be asked the question "describe this image". If you want to ask a different question about the image, you can pass that question as the extra parameter "question". For example:

moondream.get_classifications(image, 1, extra={"question": "what is the person wearing?"})

get_detections(image=binary)

get_detections_from_camera(camera_name=string, count)

Note: if using this method, any cameras you are using must be set in the depends_on array for the service configuration, for example:

      "depends_on": [
        "cam"
      ]

By default, the Moondream model will look for the class "person". If you want to detect another class, you can pass that class as the extra parameter "class". For example:

moondream.get_detections(image, extra={"class": "shoes"})

To use Moondream's "gaze detection" capabilities, either set gaze_detection to true in your service attribute config, or pass gaze_detection as true to the detections call, for example:

moondream.get_detections(image, extra={"gaze_detection": true})

If gaze_detection is activated, you detections will be returned with classes of face_counter and gaze_counter, where counter attempts to match a face with where that face is gazing.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors