Skip to content

Latest commit

 

History

History
171 lines (118 loc) · 10.3 KB

File metadata and controls

171 lines (118 loc) · 10.3 KB
RoboML Logo

中文版本 PyPI MIT licensed Python Version

RoboML is an aggregator package for quickly deploying open-source ML models for robots. It supports three main use cases:

  • Rapid deployment of general-purpose models: Wraps around popular ML libraries like 🤗 Transformers, allowing fast deployment of models through scalable server endpoints.
  • Deploy detection models with tracking: Supports deployment of all detection models in MMDetection with optional tracking integration.
  • Aggregate robot-specific models from the robotics community: Intended as a platform for community-contributed multimodal models, usable in planning and control, especially with ROS components. See EmbodiedAgents.

Models And Wrappers

Model Class Description Default Checkpoint / Resource Key Init Parameters
TransformersLLM General-purpose large language model (LLM) from 🤗 Transformers microsoft/Phi-3-mini-4k-instruct name, checkpoint, quantization, init_timeout
TransformersMLLM Multimodal vision-language model (MLLM) from 🤗 Transformers HuggingFaceM4/idefics2-8b name, checkpoint, quantization, init_timeout
RoboBrain2 Embodied planning + multimodal reasoning via RoboBrain 2.0 BAAI/RoboBrain2.0-7B name, checkpoint, init_timeout
Whisper Multilingual speech-to-text (ASR) from OpenAI Whisper small.en (checkpoint list) name, checkpoint, compute_type, init_timeout
SpeechT5 Text-to-speech model from Microsoft SpeechT5 microsoft/speecht5_tts name, checkpoint, voice, init_timeout
Bark Text-to-speech model from SunoAI Bark suno/bark-small, voice options name, checkpoint, voice, attn_implementation, init_timeout
MeloTTS Multilingual text-to-speech via MeloTTS EN, EN-US name, language, speaker_id, init_timeout
VisionModel Detection + tracking via MMDetection dino-4scale_r50_8xb2-12e_coco name, checkpoint, setup_trackers, cache_dir, tracking_distance_function, tracking_distance_threshold, deploy_tensorrt, _num_trackers, init_timeout

Installation

RoboML has been tested on Ubuntu 20.04 and later. A GPU with CUDA 12.1+ is recommended. If you encounter issues, please open an issue.

pip install roboml

From Source

git clone https://github.com/automatika-robotics/roboml.git && cd roboml
virtualenv venv && source venv/bin/activate
pip install pip-tools
pip install .

Vision Model Support

To use detection and tracking features via MMDetection:

  • Install RoboML with the vision extras:

    pip install roboml[vision]
  • Install mmcv using the appropriate CUDA and PyTorch versions as described in their docs. Example for PyTorch 2.1 with CUDA 12.1:

    pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html
  • Install mmdetection:

    git clone https://github.com/open-mmlab/mmdetection.git
    cd mmdetection
    pip install -v -e .
  • If ffmpeg or libGL is missing:

    sudo apt-get update && apt-get install ffmpeg libsm6 libxext6

TensorRT-Based Model Deployment

RoboML vision models can optionally be accelerated with NVIDIA TensorRT on Linux x86_64 systems. For setup, follow the TensorRT installation guide.

Docker Build (Recommended)

Jetson users are especially encouraged to use Docker.

git clone https://github.com/automatika-robotics/roboml.git && cd roboml

# Build container image
docker build --tag=automatika:roboml .
# For Jetson boards:
docker build --tag=automatika:roboml -f Dockerfile.Jetson .

# Run HTTP server
docker run --runtime=nvidia --gpus all --rm -p 8000:8000 automatika:roboml roboml
# Or run RESP server
docker run --runtime=nvidia --gpus all --rm -p 6379:6379 automatika:roboml roboml-resp
  • (Optional) Mount your cache dir to persist downloaded models:

    -v ~/.cache:/root/.cache

Servers

RoboML uses Ray Serve to host models as scalable apps across various environments.

WebSocket Endpoint

WebSocket endpoints are exposed for streaming use cases (e.g., STT/TTS).

Experimental RESP Server

For ultra-low latency in robotics, RoboML also includes a RESP-based server compatible with any Redis client. RESP (see spec) is a lightweight, binary-safe protocol. Combined with msgpack instead of JSON, it enables very fast I/O, ideal for binary data like images, audio, or video.

This work is inspired by @hansonkd’s Tino project.

Usage

Run the HTTP server:

roboml

Run the RESP server:

roboml-resp

Example usage in ROS clients is documented in ROS Agents.

Running Tests

Install dev dependencies:

pip install ".[dev]"

Run tests from the project root:

python -m pytest

Copyright

Unless otherwise specified, all code is © 2024 Automatika Robotics. RoboML is released under the MIT License. See LICENSE for details.

Contributions

ROS Agents is developed in collaboration between Automatika Robotics and Inria. Community contributions are welcome!