A complete computer vision system for detecting objects (cube, sphere, cylinder) in 3D space using PyBullet simulation, YOLOv8, and depth estimation. Perfect for robotics projects, pick-and-place tasks, and autonomous navigation.
This system:
- Detects objects in a simulated camera view using YOLOv8
- Measures distance to each object using depth buffers
- Calculates 3D position (X, Y, Z coordinates) in world space
- Provides accurate target coordinates for robot path planning
Perfect for robotics students and researchers working on object manipulation and navigation tasks!
π¦Detect
β£ πconfig
β β πconfig.yaml # All system parameters
β£ πscripts
β β£ πtest_detection.py # Test detection accuracy
β β πtest_path.py # Test path planning interface
β£ πsrc
β β£ πdetection
β β β πdetector.py # YOLO object detector
β β£ πperception
β β β πdepth_estimator.py # Distance & position calculation
β β£ πsimulation
β β β£ πcamera.py # PyBullet camera interface
β β β πenvironment.py # Simulation environment
β β£ πtracking
β β β πobject_tracker.py # Object tracking (optional)
β β πvision_system.py # Main vision system
β£ πweights
β β πbest.pt # Trained YOLOv8 model (2000 images)
β πrequirements.txt # Python dependencies
Option A: Miniconda (Recommended - Lighter)
- Download Miniconda for Windows from: https://docs.conda.io/en/latest/miniconda.html
- Choose "Miniconda3 Windows 64-bit" installer
- Run the installer (.exe file)
- Important: Check "Add Anaconda to my PATH environment variable" during installation
- Complete the installation
Option B: Anaconda (Full Package)
- Download from: https://www.anaconda.com/download
- Run the installer
- Follow the installation wizard
Open Command Prompt or Anaconda Prompt and test:
conda --version
# Should show: conda 23.x.x or similarIf you get an error:
- Close and reopen Command Prompt
- Or search for "Anaconda Prompt" in Windows Start Menu and use that instead
conda update condaIssue: "conda is not recognized"
- Use Anaconda Prompt instead of regular Command Prompt
- Or add conda to PATH:
- Search "Environment Variables" in Windows
- Edit "Path" variable
- Add:
C:\Users\YourUsername\miniconda3\Scripts - Add:
C:\Users\YourUsername\miniconda3\Library\bin
Issue: Permission Denied
- Run Command Prompt or Anaconda Prompt as Administrator
- Right-click β "Run as administrator"
Issue: Slow conda commands
# Use faster libmamba solver
conda install -n base conda-libmamba-solver
conda config --set solver libmambaFor Windows Users:
# Open Anaconda Prompt (NOT regular Command Prompt)
# Navigate to your project directory
cd C:\path\to\your\folder
# Clone the repository
git clone <your-repo-url>
cd Detect
# Create conda virtual environment
conda create -n robot_vision python=3.11
# Press 'y' when asked to proceed
# Activate environment
conda activate robot_vision
# You should see (robot_vision) in your prompt
# Install dependencies
pip install -r requirements.txt
# This may take 5-10 minutes - be patient!For Linux/Mac Users:
# Open Terminal
cd /path/to/your/folder
# Clone the repository
git clone <your-repo-url>
cd Detect
# Create conda virtual environment
conda create -n robot_vision python=3.11
conda activate robot_vision
# Install dependencies
pip install -r requirements.txtNote: This project was developed using conda for environment management. Python 3.11 is recommended.
Test Detection Accuracy:
python scripts/test_detection.py --scenes 5 --objects 3Test Path Planning Interface:
python scripts/test_path.pyimport sys
sys.path.append('src')
from simulation.environment import RobotEnvironment
from simulation.camera import Camera
from vision_system import VisionSystem
# Setup
env = RobotEnvironment(gui=True)
env.spawn_random_scene(num_each_type=2)
camera = Camera()
vision = VisionSystem(camera, model_path="weights/best.pt")
# Run detection
result = vision.detect_and_measure()
# Get target coordinates for path planning
for detection in result['detections']:
target_x = detection['position'][0] # X coordinate
target_y = detection['position'][1] # Y coordinate
object_type = detection['class_name']
print(f"{object_type} at [{target_x:.3f}, {target_y:.3f}]")- Trained on 2000 synthetic images
- Detects:
cube,sphere,cylinder - Returns bounding boxes and confidence scores
- Uses PyBullet's depth buffer
- Converts depth to real-world meters
- Validates measurements for accuracy
- Transforms pixel coordinates to 3D world coordinates
- Uses camera intrinsics (focal length, principal point)
- Accounts for camera pose and orientation
World Frame:
X-axis: Left/Right
Y-axis: Forward/Back
Z-axis: Up/Down
Camera Frame:
Position: [0, -1.8, 0.6] (behind workspace, elevated)
Looking at: [0, 0, 0.4] (table height)
Trained Model Performance:
- Training Dataset: 2000 images
- Distance Range: 0.8m - 1.8m
- Angle Range: -15Β° to +15Β°
Expected Accuracy:
- F1 Score Target: β₯ 80%
- Distance Error: < 15cm
- Position Error: < 20cm
All parameters are in config/config.yaml:
objects:
sphere:
diameter: 0.15
color: [0.0, 0.0, 1.0, 1.0] # Blue
class_id: 1
cube:
size: [0.20, 0.20, 0.20]
color: [1.0, 0.0, 0.0, 1.0] # Red
class_id: 0
cylinder:
height: 0.16
radius: 0.06
color: [0.0, 1.0, 0.0, 1.0] # Green
class_id: 2camera:
width: 640
height: 480
fov: 60
position: [0, -1.8, 0.6]
target: [0, 0, 0.4]detection:
model: "yolov8n.pt"
confidence_threshold: 0.5
iou_threshold: 0.45# Test with 10 scenes, 5 objects each
python scripts/test_detection.py --scenes 10 --objects 5
# Test without visualization (faster)
python scripts/test_detection.py --scenes 20 --objects 3 --no-vizTEST RESULTS
======================================================================
Overall Performance:
Total objects: 50
Detected: 45
Missed: 5
False positives: 2
Metrics:
Precision: 95.74%
Recall: 90.00%
F1 Score: 92.78%
Distance Estimation:
Mean error: 0.089m
Std dev: 0.067m
Max error: 0.234m
Always use Anaconda Prompt and make sure environment is activated:
# Check if environment is active (you should see it in prompt)
# Correct: (robot_vision) C:\Users\YourName\Detect>
# Wrong: C:\Users\YourName\Detect>
# If not active, activate it:
conda activate robot_vision
# Then run scripts:
python scripts\test_detection.py
python scripts\test_path.pyUse backslashes \ or forward slashes /:
# Both work on Windows:
python scripts\test_detection.py
python scripts/test_detection.pyCtrl + C: Stop running scriptcls: Clear terminal screendir: List files (instead ofls)cd ..: Go up one directory
If you have NVIDIA GPU:
# Install CUDA-enabled PyTorch
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118# Detect objects
result = vision.detect_and_measure()
# Plan path to each object
for obj in result['detections']:
target = [obj['position'][0], obj['position'][1]]
robot.move_to(target)
robot.pick_up(obj['class_name'])# Detect and sort by type
cubes = [d for d in detections if d['class_name'] == 'cube']
spheres = [d for d in detections if d['class_name'] == 'sphere']
# Process each category
for cube in cubes:
robot.move_to_bin('cube_bin', cube['position'])# Detect obstacles
obstacles = vision.detect_and_measure()['detections']
# Get obstacle positions for path planner
obstacle_positions = [d['position'][:2] for d in obstacles]
path = path_planner.plan(start, goal, obstacles=obstacle_positions)Environment Management:
- Conda: This project uses conda for virtual environment management
- Python: 3.11 (recommended)
Main libraries:
- PyBullet (β₯3.2.5): Physics simulation
- YOLOv8/Ultralytics (β₯8.0.0): Object detection
- OpenCV (β₯4.8.0): Image processing
- NumPy (β₯1.24.0): Numerical computing
- PyTorch (β₯2.0.0): Deep learning backend
- PyYAML (β₯6.0): Configuration files
See requirements.txt for complete list.
Managing your environment:
# Activate environment
conda activate robot_vision
# Deactivate when done
conda deactivate
# Remove environment (if needed)
conda env remove -n robot_visionCreated for robotics research and education. Trained on 2000 synthetic images generated in PyBullet simulation. By THU HTOO ZAW, THIRI TOE TOE ZIN