Skip to content

Vectorized motion planner (VAMP) using AVX2 SIMD for microsecond collision checking and 15× faster 6-DOF planning.

Notifications You must be signed in to change notification settings

chao-dotcom/vamp-vectorized-motion-planner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VAMP: Vectorized Approximate Motion Planning

ICRA 2024 C++17 AVX2 License

High-performance motion planning through SIMD vectorization

Implementation of "Motions in Microseconds via Vectorized Sampling-Based Planning" (Thomason et al., ICRA 2024)


Highlights

  • 15.27x Speedup over sequential implementation
  • 1.8M+ checks/sec collision detection throughput
  • Sub-microsecond state validation (0.55 µs per check)
  • Full 6-DOF robot arm support with 3D kinematics
  • Hierarchical sphere culling with early exit optimization
  • SLEEF vectorized trigonometric functions

Performance Results

Sequential vs SIMD Comparison

Test Case States Sequential SIMD (VAMP) Speedup
100K State Validation 100,000 842.84 ms 55.21 ms 15.27x
Throughput - 118,647 checks/sec 1,811,390 checks/sec 15.27x
Single State Check 1 8.43 µs 0.55 µs 15.27x
Motion Validation 1,000 21.25 ms 4.25 ms 5.0x
Correctness 100,000 ✓ PASSED ✓ PASSED Identical

Detailed Benchmark Results

Benchmark Metric Value
Single State Validation Time (10K states) 25.68 ms
Time per check 2.57 µs
Throughput 389,406 checks/sec
Valid states 15.73%
Motion Validation Time (1K motions) 4.25 ms
Time per motion 4.25 µs
Throughput 235,175 motions/sec
Valid motions 0.7%
RRT-Connect Planning Mean planning time 0.076 ms
Median planning time 0.079 ms
Success rate 10/10 (100%)
Mean path length 12.4 waypoints
Environment Obstacles 51 boxes, 20 spheres
Robot 6-DOF arm
Hierarchical spheres L0=1, L1=2, L2=4

Performance Comparison Table

Implementation Method Optimization Throughput Speedup
Sequential Scalar FK + Scalar CC -O2 118,647 checks/sec 1.0x (baseline)
SIMD (no SLEEF) Vector FK + Vector CC -O2 ~600,000 checks/sec ~5x
SIMD + SLEEF Vector FK + Vector CC -O3 -ffast-math 1,811,390 checks/sec 15.27x

Comparison with Original VAMP Paper

Feature Original Paper This Implementation Status
Speedup Range 5-20x 15.27x ✓ Within range
SIMD Width 8 (AVX2) 8 (AVX2) ✓ Identical
Hierarchical Culling 3 levels 3 levels (Coarse/Medium/Fine) ✓ Implemented
FK/CC Interleaving Per-link Per-link with early exit ✓ Implemented
Batched Validation 8-wide 8-wide RAKE pattern ✓ Implemented
Vectorized Math SLEEF SLEEF (optional) ✓ Supported
Robot Support Multi-DOF 6-DOF arm with DH params ✓ Implemented

Conclusion: This implementation achieves performance in the top tier of the original VAMP paper's reported range (15.27x out of 5-20x).


Quick Start

Prerequisites

  • C++17 compiler (GCC 7+, Clang 5+)
  • AVX2 support (Intel Haswell+, AMD Excavator+)
  • SLEEF library (optional, for vectorized trig)

Build

# Clone the repository
git clone <your-repo-url>
cd VAMP

# Build with default settings (no SLEEF)
make

# Build with SLEEF + maximum optimizations (recommended)
make v13-perf-sleef

# Run benchmarks
./build/planner_v13_perf_sleef

Docker Build

# Build Docker image
docker build -t vamp .

# Run in Docker
docker run -it vamp

# Inside container
cd /workspace/VAMP
make v13-perf-sleef
./build/planner_v13_perf_sleef

Architecture

Core Components

  1. Vectorized Collision Detection

    • Struct-of-Arrays (SoA) data layout
    • Batch processing of 8 configurations simultaneously
    • AVX2 intrinsics for SIMD parallelism
  2. Hierarchical Sphere Culling

    • 3-level hierarchy (Coarse → Medium → Fine)
    • Early exit optimization
    • Conservative bounds for efficient pruning
  3. FK/CC Interleaving

    • Forward kinematics and collision checking interleaved
    • Per-link computation and validation
    • Early collision detection
  4. Batched State Validation

    • Rake pattern for motion validation
    • 8-wide SIMD batching
    • Optimized memory access patterns

Technical Stack

  • Language: C++17
  • SIMD: Intel AVX2 (__m256)
  • Vectorized Math: SLEEF (optional)
  • Planner: RRT-Connect
  • Robot: 6-DOF arm with DH parameters

Project Structure

VAMP/
├── src/
│   └── planner.cpp              # Main implementation
├── proj_desc/
│   ├── README.md                # Project description
│   ├── summary-zh.md            # Chinese summary
│   └── vamp-differences.md      # VAMP vs sequential comparison
├── Makefile                     # Build configuration
├── Dockerfile                   # Docker environment
├── result.txt                   # Benchmark results
└── README.md                    # This file

Implementation Details

1. Vectorized Collision Detection

// Batch 8 states together
VectorizedConfig3DArm vec(6);
vec.loadFromAOS(batch);  // Array-of-Structures → Structure-of-Arrays

// Check all 8 states simultaneously
uint8_t collision_mask = fk_cc.computeFK_Interleaved_6DOF(vec);

// Extract results
for (int i = 0; i < 8; i++) {
    if (collision_mask & (1 << i)) {
        // State i collides
    }
}

2. Hierarchical Culling

Level 0 (Coarse):  ●━━━━━━━━━━━━━━━● 
                   Check large bounding sphere
                   ↓ (if hit)
Level 1 (Medium):  ●━━━●━━━●━━━●━━━●
                   Check 2-3 medium spheres
                   ↓ (if hit)
Level 2 (Fine):    ●●●●●●●●●●●●●●●●●
                   Check 4+ fine spheres

3. Memory Alignment

All SIMD data is 32-byte aligned for optimal performance:

alignas(32) float data[8];  // Aligned for AVX2
__m256 vec = _mm256_load_ps(data);  // Fast aligned load

Educational Value

This project demonstrates:

  • SIMD Programming: AVX2 intrinsics, vectorization patterns
  • Data Structure Optimization: SoA layout, memory alignment
  • Algorithm Design: Hierarchical culling, early exit strategies
  • Performance Engineering: Profiling, benchmarking, optimization
  • Robotics: Forward kinematics, collision detection, motion planning

Build Options

# Correctness build (recommended for development)
make v13                  # Scalar trig, -O2
make v13-sleef           # SLEEF trig, -O2

# Performance build (recommended for benchmarking)
make v13-perf            # Scalar trig, -O3 -ffast-math
make v13-perf-sleef      # SLEEF trig, -O3 -ffast-math (FASTEST)

# Run benchmarks
make run-v13
make run-v13-perf-sleef

# Compare scalar vs SLEEF
make compare

# Clean build artifacts
make clean

Key Optimizations

1. Proper Batching ✓

Before: Checking 1 state with 7 dummy states (only 12.5% utilization) After: Checking 8 real states simultaneously (100% utilization) Impact: ~8x speedup

2. SLEEF Integration ✓

Before: Scalar sinf()/cosf() in loops After: Vectorized Sleef_sinf4_u35()/Sleef_cosf4_u35() Impact: ~2x speedup on trig-heavy code

3. Hierarchical Culling ✓

Before: Check all spheres always After: Coarse → Medium → Fine with early exit Impact: Reduced collision checks by 30-50%

4. Compilation Flags ✓

-O3 -mavx2 -march=native -ffast-math -flto

Impact: 20-30% additional speedup


Why VAMP Matters

Traditional motion planners spend 60-80% of time in collision detection. VAMP's vectorization approach:

  • Reduces collision checking time by 15x
  • Enables real-time motion planning
  • Scales to complex robots and environments
  • Works on standard CPUs (no GPU required)
  • Zero overhead compared to sequential code

Achievements

  • 15.27x speedup (exceeds theoretical 8x SIMD width)
  • 1.8M+ collision checks per second
  • Full 6-DOF robot support
  • Hierarchical sphere culling
  • SLEEF vectorized trig integration
  • Complete RRT-Connect planner
  • Comprehensive benchmarking
  • Correctness validation (parity test)

References

  1. W. Thomason, Z. Kingston, and L. E. Kavraki, "Motions in microseconds via vectorized sampling-based planning," in IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 8749–8756. Link

  2. C. Ericson, Real-Time Collision Detection. USA: CRC Press, Inc., 2004.


Contributing

This is an educational project implementing the VAMP framework. Contributions welcome for:

  • Additional robot models
  • More obstacle types
  • Performance optimizations
  • Documentation improvements

License

MIT License - see LICENSE file for details


Acknowledgments

  • Original VAMP paper authors (Thomason, Kingston, Kavraki)
  • SLEEF library developers
  • Course instructors and TAs

Built for high-performance robotics

Last updated: 2025

About

Vectorized motion planner (VAMP) using AVX2 SIMD for microsecond collision checking and 15× faster 6-DOF planning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published