Robo Voice Control

A ROS 2 package for voice-controlled robot navigation and interaction, including voice feedback and environment awareness.

Project Goal

The robot (Jetson-Nano) should:

Use LiDAR (and optionally a camera later) for SLAM and world model building.
Be controllable via voice commands.
Provide voice feedback using Text-to-Speech (TTS).
Check whether a requested action is possible based on the world model and provide appropriate feedback if not.
Execute navigation and driving commands accordingly.

Main Components

Node	Description
`asr_node.py`	Transcribes speech using Whisper
`llm_node.py`	Interprets commands using Qwen3 0.6B finetuned
`control_node.py`	Checks feasibility of commands, plans motion
`tts_node.py`	Converts feedback text into speech using `tts_models/en/ljspeech/tacotron2-DDC_ph`

Architecture Notes

LLM Service Architecture

Due to software constraints on the Jetson Nano, inference can be delegated to a locally running FastAPI-based LLM service.

The Nano supports only CUDA 10.2. However, the transformers library requires Python ≥ 3.10, and the only PyTorch build with CUDA support for the Nano targets Python 3.6. Therefore, the local service must run with Python 3.10 and uses CPU-only inference. CUDA cannot be used in this configuration.

Refer to the robo_voice_control_llm_service repository for setup instructions.

ASR Service Architecture

The asr_node was initially designed to run the ASR model (Moonshine) locally within the ROS node. However, on the Jetson Nano, this is not viable because:

The transformers library version compatible with Python 3.8 is too old to support MoonshineForConditionalGeneration
The required ASR model is not available under the current setup due to Python and CUDA constraints

Package Structure

robo_voice_control/
├── asr_node.py
├── control_node.py
├── llm_node.py
└── tts_node.py

Example Workflow

User gives voice command: "Move forward"
Microphone captures audio → Whisper transcribes it
LLM processes the text → interprets as "move_forward"
Control node checks SLAM-based world model:
- If action is possible → initiates driving
- If not possible → responds via TTS: "That action is not possible."

Technologies Used

ROS 2 (rclpy)
whisper for ASR (Automatic Speech Recognition)
Qwen3 0.6B finetuned for command interpretation
TTS: tts_models/en/ljspeech/tacotron2-DDC_ph
SLAM using LiDAR (Slam-Toolbox)
SLAMTEC RPLIDAR ROS2 Package for LiDAR sensor integration
Jetson platform for local inference

Installation and Setup

Prerequisites

ROS 2 Humble Hawksbill
Python 3.8+
Ubuntu 22.04 LTS (recommended)
Audio capture device (microphone)
Audio output device (speakers/headphones)

System Dependencies

# Install system dependencies
sudo apt update
sudo apt install -y python3-pip python3-dev portaudio19-dev
sudo apt install -y ros-humble-slam-toolbox
sudo apt install -y ros-humble-sound-play

Python Dependencies

Install the required Python packages:

pip install -r requirements.txt

ROS 2 Setup

Install Rosdep (if not already installed):

sudo apt-get install python3-rosdep
sudo rosdep init
rosdep update

Install ROS dependencies:

rosdep install --from-paths src --ignore-src -r -y

Build the package:

colcon build --symlink-install
colcon build --symlink-install --cmake-args -DGGML_CUDA=Off
source install/setup.bash

Usage

Quick Start

Start the SLAM system:

ros2 launch robo_voice_control slam_launch.py

Start all voice control nodes:

ros2 launch robo_voice_control all_nodes_launch.py

Audio Setup

Topic: /audio
GitHub: audio_common

Start the audio capturer node:

ros2 run audio_common audio_capturer_node

Manual Node Starting

ASR Node

Publishes the last 10 seconds of the transcribed audio.

Topic: /asr/text

Start ASR node:

ros2 run robo_voice_control asr_node

LLM Node

Interprets the ASR text from the topic /asr/text and converts it to commands.

Finetuned Qwen3 0.6B: Finetuned using this repository.

Example:

{"input": "Go forward for 2.8 meters", "output": "MOVE FORWARD 2.8;"}
  
{"input": "Drive in a circle", "output": "COMMAND NOT RECOGNIZED;"}

Publishes: /llm/command_interpretations

Starting the node:

Parameters:

model_path → path to the LLM model, e.g., "/media/psf/DATA SSD/LLMs/finetunes/qwen3_0.6B/checkpoint-310"

Default:

ros2 run robo_voice_control llm_node

With model parameter:

ros2 run robo_voice_control llm_node \
  --ros-args -p model_path:="/media/psf/DATA SSD/LLMs/finetunes/qwen3_0.6B/checkpoint-310"

TTS Node

Install dependencies:

pip install TTS sounddevice numpy
sudo apt install ros-humble-sound-play

Start TTS node:

ros2 run robo_voice_control tts_node

SLAM Setup and Usage

Starting SLAM Toolbox

Important: Transforms must be started before the SLAM toolbox.

Start SLLiDAR Node

sudo chmod 777 /dev/ttyUSB1
ros2 launch sllidar_ros2 view_sllidar_a1_launch.py

Verification

Verify the TF tree is properly configured:

ros2 run tf2_tools view_frames

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

OpenAI Whisper for ASR
Qwen3 Model for command interpretation finetuned by Christopher Witzl
TTS for text-to-speech synthesis
SLAM Toolbox for mapping and localization

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
launch		launch
resource		resource
robo_voice_control		robo_voice_control
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.xml		package.xml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Robo Voice Control

Table of Contents

Project Goal

Main Components

Architecture Notes

LLM Service Architecture

ASR Service Architecture

Package Structure

Example Workflow

Technologies Used

Installation and Setup

Prerequisites

System Dependencies

Python Dependencies

ROS 2 Setup

Usage

Quick Start

Audio Setup

Manual Node Starting

ASR Node

LLM Node

Starting the node:

TTS Node

SLAM Setup and Usage

Starting SLAM Toolbox

Start SLLiDAR Node

Verification

License

Contributing

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages