roboml/README.md at main · automatika-robotics/roboml

RoboML is an aggregator package for quickly deploying open-source ML models for robots. It supports three main use cases:

Rapid deployment of general-purpose models: Wraps around popular ML libraries like 🤗 Transformers, allowing fast deployment of models through scalable server endpoints.
Deploy detection models with tracking: Supports deployment of all detection models in MMDetection with optional tracking integration.
Aggregate robot-specific models from the robotics community: Intended as a platform for community-contributed multimodal models, usable in planning and control, especially with ROS components. See EmbodiedAgents.

Models And Wrappers

Model Class	Description	Default Checkpoint / Resource	Key Init Parameters
`TransformersLLM`	General-purpose large language model (LLM) from 🤗 Transformers	`microsoft/Phi-3-mini-4k-instruct`	`name`, `checkpoint`, `quantization`, `init_timeout`
`TransformersMLLM`	Multimodal vision-language model (MLLM) from 🤗 Transformers	`HuggingFaceM4/idefics2-8b`	`name`, `checkpoint`, `quantization`, `init_timeout`
`RoboBrain2`	Embodied planning + multimodal reasoning via RoboBrain 2.0	`BAAI/RoboBrain2.0-7B`	`name`, `checkpoint`, `init_timeout`
`Whisper`	Multilingual speech-to-text (ASR) from OpenAI Whisper	`small.en` (checkpoint list)	`name`, `checkpoint`, `compute_type`, `init_timeout`
`SpeechT5`	Text-to-speech model from Microsoft SpeechT5	`microsoft/speecht5_tts`	`name`, `checkpoint`, `voice`, `init_timeout`
`Bark`	Text-to-speech model from SunoAI Bark	`suno/bark-small`, voice options	`name`, `checkpoint`, `voice`, `attn_implementation`, `init_timeout`
`MeloTTS`	Multilingual text-to-speech via MeloTTS	`EN`, `EN-US`	`name`, `language`, `speaker_id`, `init_timeout`
`VisionModel`	Detection + tracking via MMDetection	`dino-4scale_r50_8xb2-12e_coco`	`name`, `checkpoint`, `setup_trackers`, `cache_dir`, `tracking_distance_function`, `tracking_distance_threshold`, `deploy_tensorrt`, `_num_trackers`, `init_timeout`

Installation

RoboML has been tested on Ubuntu 20.04 and later. A GPU with CUDA 12.1+ is recommended. If you encounter issues, please open an issue.

pip install roboml

From Source

git clone https://github.com/automatika-robotics/roboml.git && cd roboml
virtualenv venv && source venv/bin/activate
pip install pip-tools
pip install .

Vision Model Support

To use detection and tracking features via MMDetection:

Install RoboML with the vision extras:
```
pip install roboml[vision]
```
Install mmcv using the appropriate CUDA and PyTorch versions as described in their docs. Example for PyTorch 2.1 with CUDA 12.1:
```
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html
```

Install mmdetection:

git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -v -e .

If ffmpeg or libGL is missing:

sudo apt-get update && apt-get install ffmpeg libsm6 libxext6

TensorRT-Based Model Deployment

RoboML vision models can optionally be accelerated with NVIDIA TensorRT on Linux x86_64 systems. For setup, follow the TensorRT installation guide.

Docker Build (Recommended)

Jetson users are especially encouraged to use Docker.

Install Docker Desktop
Install the NVIDIA Container Toolkit

git clone https://github.com/automatika-robotics/roboml.git && cd roboml

# Build container image
docker build --tag=automatika:roboml .
# For Jetson boards:
docker build --tag=automatika:roboml -f Dockerfile.Jetson .

# Run HTTP server
docker run --runtime=nvidia --gpus all --rm -p 8000:8000 automatika:roboml roboml
# Or run RESP server
docker run --runtime=nvidia --gpus all --rm -p 6379:6379 automatika:roboml roboml-resp

(Optional) Mount your cache dir to persist downloaded models:
```
-v ~/.cache:/root/.cache
```

Servers

RoboML uses Ray Serve to host models as scalable apps across various environments.

WebSocket Endpoint

WebSocket endpoints are exposed for streaming use cases (e.g., STT/TTS).

Experimental RESP Server

For ultra-low latency in robotics, RoboML also includes a RESP-based server compatible with any Redis client. RESP (see spec) is a lightweight, binary-safe protocol. Combined with msgpack instead of JSON, it enables very fast I/O, ideal for binary data like images, audio, or video.

This work is inspired by @hansonkd’s Tino project.

Usage

Run the HTTP server:

roboml

Run the RESP server:

roboml-resp

Example usage in ROS clients is documented in ROS Agents.

Running Tests

Install dev dependencies:

pip install ".[dev]"

Run tests from the project root:

python -m pytest

Copyright

Contributions

ROS Agents is developed in collaboration between Automatika Robotics and Inria. Community contributions are welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models And Wrappers

Installation

From Source

Vision Model Support

TensorRT-Based Model Deployment

Docker Build (Recommended)

Servers

WebSocket Endpoint

Experimental RESP Server

Usage

Running Tests

Copyright

Contributions

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Models And Wrappers

Installation

From Source

Vision Model Support

TensorRT-Based Model Deployment

Docker Build (Recommended)

Servers

WebSocket Endpoint

Experimental RESP Server

Usage

Running Tests

Copyright

Contributions