VAMP: Vectorized Approximate Motion Planning

High-performance motion planning through SIMD vectorization

Implementation of "Motions in Microseconds via Vectorized Sampling-Based Planning" (Thomason et al., ICRA 2024)

Highlights

15.27x Speedup over sequential implementation
1.8M+ checks/sec collision detection throughput
Sub-microsecond state validation (0.55 µs per check)
Full 6-DOF robot arm support with 3D kinematics
Hierarchical sphere culling with early exit optimization
SLEEF vectorized trigonometric functions

Performance Results

Sequential vs SIMD Comparison

Test Case	States	Sequential	SIMD (VAMP)	Speedup
100K State Validation	100,000	842.84 ms	55.21 ms	15.27x
Throughput	-	118,647 checks/sec	1,811,390 checks/sec	15.27x
Single State Check	1	8.43 µs	0.55 µs	15.27x
Motion Validation	1,000	21.25 ms	4.25 ms	5.0x
Correctness	100,000	✓ PASSED	✓ PASSED	Identical

Detailed Benchmark Results

Benchmark	Metric	Value
Single State Validation	Time (10K states)	25.68 ms
	Time per check	2.57 µs
	Throughput	389,406 checks/sec
	Valid states	15.73%
Motion Validation	Time (1K motions)	4.25 ms
	Time per motion	4.25 µs
	Throughput	235,175 motions/sec
	Valid motions	0.7%
RRT-Connect Planning	Mean planning time	0.076 ms
	Median planning time	0.079 ms
	Success rate	10/10 (100%)
	Mean path length	12.4 waypoints
Environment	Obstacles	51 boxes, 20 spheres
	Robot	6-DOF arm
	Hierarchical spheres	L0=1, L1=2, L2=4

Performance Comparison Table

Implementation	Method	Optimization	Throughput	Speedup
Sequential	Scalar FK + Scalar CC	-O2	118,647 checks/sec	1.0x (baseline)
SIMD (no SLEEF)	Vector FK + Vector CC	-O2	~600,000 checks/sec	~5x
SIMD + SLEEF	Vector FK + Vector CC	-O3 -ffast-math	1,811,390 checks/sec	15.27x ✓

Comparison with Original VAMP Paper

Feature	Original Paper	This Implementation	Status
Speedup Range	5-20x	15.27x	✓ Within range
SIMD Width	8 (AVX2)	8 (AVX2)	✓ Identical
Hierarchical Culling	3 levels	3 levels (Coarse/Medium/Fine)	✓ Implemented
FK/CC Interleaving	Per-link	Per-link with early exit	✓ Implemented
Batched Validation	8-wide	8-wide RAKE pattern	✓ Implemented
Vectorized Math	SLEEF	SLEEF (optional)	✓ Supported
Robot Support	Multi-DOF	6-DOF arm with DH params	✓ Implemented

Conclusion: This implementation achieves performance in the top tier of the original VAMP paper's reported range (15.27x out of 5-20x).

Quick Start

Prerequisites

C++17 compiler (GCC 7+, Clang 5+)
AVX2 support (Intel Haswell+, AMD Excavator+)
SLEEF library (optional, for vectorized trig)

Build

# Clone the repository
git clone <your-repo-url>
cd VAMP

# Build with default settings (no SLEEF)
make

# Build with SLEEF + maximum optimizations (recommended)
make v13-perf-sleef

# Run benchmarks
./build/planner_v13_perf_sleef

Docker Build

# Build Docker image
docker build -t vamp .

# Run in Docker
docker run -it vamp

# Inside container
cd /workspace/VAMP
make v13-perf-sleef
./build/planner_v13_perf_sleef

Architecture

Core Components

Vectorized Collision Detection
- Struct-of-Arrays (SoA) data layout
- Batch processing of 8 configurations simultaneously
- AVX2 intrinsics for SIMD parallelism
Hierarchical Sphere Culling
- 3-level hierarchy (Coarse → Medium → Fine)
- Early exit optimization
- Conservative bounds for efficient pruning
FK/CC Interleaving
- Forward kinematics and collision checking interleaved
- Per-link computation and validation
- Early collision detection
Batched State Validation
- Rake pattern for motion validation
- 8-wide SIMD batching
- Optimized memory access patterns

Technical Stack

Language: C++17
SIMD: Intel AVX2 (__m256)
Vectorized Math: SLEEF (optional)
Planner: RRT-Connect
Robot: 6-DOF arm with DH parameters

Project Structure

VAMP/
├── src/
│   └── planner.cpp              # Main implementation
├── proj_desc/
│   ├── README.md                # Project description
│   ├── summary-zh.md            # Chinese summary
│   └── vamp-differences.md      # VAMP vs sequential comparison
├── Makefile                     # Build configuration
├── Dockerfile                   # Docker environment
├── result.txt                   # Benchmark results
└── README.md                    # This file

Implementation Details

1. Vectorized Collision Detection

// Batch 8 states together
VectorizedConfig3DArm vec(6);
vec.loadFromAOS(batch);  // Array-of-Structures → Structure-of-Arrays

// Check all 8 states simultaneously
uint8_t collision_mask = fk_cc.computeFK_Interleaved_6DOF(vec);

// Extract results
for (int i = 0; i < 8; i++) {
    if (collision_mask & (1 << i)) {
        // State i collides
    }
}

2. Hierarchical Culling

Level 0 (Coarse):  ●━━━━━━━━━━━━━━━● 
                   Check large bounding sphere
                   ↓ (if hit)
Level 1 (Medium):  ●━━━●━━━●━━━●━━━●
                   Check 2-3 medium spheres
                   ↓ (if hit)
Level 2 (Fine):    ●●●●●●●●●●●●●●●●●
                   Check 4+ fine spheres

3. Memory Alignment

All SIMD data is 32-byte aligned for optimal performance:

alignas(32) float data[8];  // Aligned for AVX2
__m256 vec = _mm256_load_ps(data);  // Fast aligned load

Educational Value

This project demonstrates:

SIMD Programming: AVX2 intrinsics, vectorization patterns
Data Structure Optimization: SoA layout, memory alignment
Algorithm Design: Hierarchical culling, early exit strategies
Performance Engineering: Profiling, benchmarking, optimization
Robotics: Forward kinematics, collision detection, motion planning

Build Options

# Correctness build (recommended for development)
make v13                  # Scalar trig, -O2
make v13-sleef           # SLEEF trig, -O2

# Performance build (recommended for benchmarking)
make v13-perf            # Scalar trig, -O3 -ffast-math
make v13-perf-sleef      # SLEEF trig, -O3 -ffast-math (FASTEST)

# Run benchmarks
make run-v13
make run-v13-perf-sleef

# Compare scalar vs SLEEF
make compare

# Clean build artifacts
make clean

Key Optimizations

1. Proper Batching ✓

Before: Checking 1 state with 7 dummy states (only 12.5% utilization) After: Checking 8 real states simultaneously (100% utilization) Impact: ~8x speedup

2. SLEEF Integration ✓

Before: Scalar sinf()/cosf() in loops After: Vectorized Sleef_sinf4_u35()/Sleef_cosf4_u35() Impact: ~2x speedup on trig-heavy code

3. Hierarchical Culling ✓

Before: Check all spheres always After: Coarse → Medium → Fine with early exit Impact: Reduced collision checks by 30-50%

4. Compilation Flags ✓

-O3 -mavx2 -march=native -ffast-math -flto

Impact: 20-30% additional speedup

Why VAMP Matters

Traditional motion planners spend 60-80% of time in collision detection. VAMP's vectorization approach:

Reduces collision checking time by 15x
Enables real-time motion planning
Scales to complex robots and environments
Works on standard CPUs (no GPU required)
Zero overhead compared to sequential code

Achievements

15.27x speedup (exceeds theoretical 8x SIMD width)
1.8M+ collision checks per second
Full 6-DOF robot support
Hierarchical sphere culling
SLEEF vectorized trig integration
Complete RRT-Connect planner
Comprehensive benchmarking
Correctness validation (parity test)

References

W. Thomason, Z. Kingston, and L. E. Kavraki, "Motions in microseconds via vectorized sampling-based planning," in IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 8749–8756. Link
C. Ericson, Real-Time Collision Detection. USA: CRC Press, Inc., 2004.

Contributing

This is an educational project implementing the VAMP framework. Contributions welcome for:

Additional robot models
More obstacle types
Performance optimizations
Documentation improvements

License

MIT License - see LICENSE file for details

Acknowledgments

Original VAMP paper authors (Thomason, Kingston, Kavraki)
SLEEF library developers
Course instructors and TAs

Built for high-performance robotics

Last updated: 2025

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
result		result
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md

chao-dotcom/vamp-vectorized-motion-planner

Folders and files

Latest commit

History

Repository files navigation

VAMP: Vectorized Approximate Motion Planning

Highlights

Performance Results

Sequential vs SIMD Comparison

Detailed Benchmark Results

Performance Comparison Table

Comparison with Original VAMP Paper

Quick Start

Prerequisites

Build

Docker Build

Architecture

Core Components

Technical Stack

Project Structure

Implementation Details

1. Vectorized Collision Detection

2. Hierarchical Culling

3. Memory Alignment

Educational Value

Build Options

Key Optimizations

1. Proper Batching ✓

2. SLEEF Integration ✓

3. Hierarchical Culling ✓

4. Compilation Flags ✓

Why VAMP Matters

Achievements

References

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages