Skip to content

ChipCracker/robo-voice-control

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Robo Voice Control

License: MIT ROS 2

A ROS 2 package for voice-controlled robot navigation and interaction, including voice feedback and environment awareness.

Table of Contents

Project Goal

The robot (Jetson-Nano) should:

  • Use LiDAR (and optionally a camera later) for SLAM and world model building.
  • Be controllable via voice commands.
  • Provide voice feedback using Text-to-Speech (TTS).
  • Check whether a requested action is possible based on the world model and provide appropriate feedback if not.
  • Execute navigation and driving commands accordingly.

Main Components

Node Description
asr_node.py Transcribes speech using Whisper
llm_node.py Interprets commands using Qwen3 0.6B finetuned
control_node.py Checks feasibility of commands, plans motion
tts_node.py Converts feedback text into speech using tts_models/en/ljspeech/tacotron2-DDC_ph

Architecture Notes

LLM Service Architecture

Due to software constraints on the Jetson Nano, inference can be delegated to a locally running FastAPI-based LLM service.

The Nano supports only CUDA 10.2. However, the transformers library requires Python ≥ 3.10, and the only PyTorch build with CUDA support for the Nano targets Python 3.6. Therefore, the local service must run with Python 3.10 and uses CPU-only inference. CUDA cannot be used in this configuration.

Refer to the robo_voice_control_llm_service repository for setup instructions.

ASR Service Architecture

The asr_node was initially designed to run the ASR model (Moonshine) locally within the ROS node. However, on the Jetson Nano, this is not viable because:

  • The transformers library version compatible with Python 3.8 is too old to support MoonshineForConditionalGeneration
  • The required ASR model is not available under the current setup due to Python and CUDA constraints

Package Structure

robo_voice_control/
├── asr_node.py
├── control_node.py
├── llm_node.py
└── tts_node.py

Example Workflow

  1. User gives voice command: "Move forward"

  2. Microphone captures audio → Whisper transcribes it

  3. LLM processes the text → interprets as "move_forward"

  4. Control node checks SLAM-based world model:

    • If action is possible → initiates driving
    • If not possible → responds via TTS: "That action is not possible."

Technologies Used

  • ROS 2 (rclpy)
  • whisper for ASR (Automatic Speech Recognition)
  • Qwen3 0.6B finetuned for command interpretation
  • TTS: tts_models/en/ljspeech/tacotron2-DDC_ph
  • SLAM using LiDAR (Slam-Toolbox)
  • SLAMTEC RPLIDAR ROS2 Package for LiDAR sensor integration
  • Jetson platform for local inference

Installation and Setup

Prerequisites

  • ROS 2 Humble Hawksbill
  • Python 3.8+
  • Ubuntu 22.04 LTS (recommended)
  • Audio capture device (microphone)
  • Audio output device (speakers/headphones)

System Dependencies

# Install system dependencies
sudo apt update
sudo apt install -y python3-pip python3-dev portaudio19-dev
sudo apt install -y ros-humble-slam-toolbox
sudo apt install -y ros-humble-sound-play

Python Dependencies

Install the required Python packages:

pip install -r requirements.txt

ROS 2 Setup

  1. Install Rosdep (if not already installed):
sudo apt-get install python3-rosdep
sudo rosdep init
rosdep update
  1. Install ROS dependencies:
rosdep install --from-paths src --ignore-src -r -y
  1. Build the package:
colcon build --symlink-install
colcon build --symlink-install --cmake-args -DGGML_CUDA=Off
source install/setup.bash

Usage

Quick Start

  1. Start the SLAM system:
ros2 launch robo_voice_control slam_launch.py
  1. Start all voice control nodes:
ros2 launch robo_voice_control all_nodes_launch.py

Audio Setup

Topic: /audio
GitHub: audio_common

Start the audio capturer node:

ros2 run audio_common audio_capturer_node

Manual Node Starting

ASR Node

Publishes the last 10 seconds of the transcribed audio.

Topic: /asr/text

Start ASR node:

ros2 run robo_voice_control asr_node

LLM Node

Interprets the ASR text from the topic /asr/text and converts it to commands.

Finetuned Qwen3 0.6B: Finetuned using this repository.

Example:

{"input": "Go forward for 2.8 meters", "output": "MOVE FORWARD 2.8;"}
  
{"input": "Drive in a circle", "output": "COMMAND NOT RECOGNIZED;"}

Publishes: /llm/command_interpretations

Starting the node:

Parameters:

  • model_path → path to the LLM model, e.g., "/media/psf/DATA SSD/LLMs/finetunes/qwen3_0.6B/checkpoint-310"

Default:

ros2 run robo_voice_control llm_node

With model parameter:

ros2 run robo_voice_control llm_node \
  --ros-args -p model_path:="/media/psf/DATA SSD/LLMs/finetunes/qwen3_0.6B/checkpoint-310"

TTS Node

Install dependencies:

pip install TTS sounddevice numpy
sudo apt install ros-humble-sound-play

Start TTS node:

ros2 run robo_voice_control tts_node

SLAM Setup and Usage

Starting SLAM Toolbox

Important: Transforms must be started before the SLAM toolbox.

Start SLLiDAR Node

sudo chmod 777 /dev/ttyUSB1
ros2 launch sllidar_ros2 view_sllidar_a1_launch.py

Verification

Verify the TF tree is properly configured:

ros2 run tf2_tools view_frames

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages