feat: add multi-axis support for xt::roll (numpy.roll parity) #2877

f14XuanLv · 2025-12-03T11:42:49Z

Checklist

The title and commit message(s) are descriptive.
Small commits made to fix your PR have been squashed to avoid history pollution.
Tests have been added for new features or bug fixes.
API of new functions and classes are documented.

Description

This PR extends the xt::roll function to support multiple axes simultaneously, completing the NumPy-compatible API as originally requested.

Related: #1766, #1823

The original implementation in #1823 only supported single-axis roll operations. According to NumPy's roll signature, both shift and axis parameters can be tuples of ints:

numpy.roll(a, shift, axis=None)
# shift : int or tuple of ints
# axis : int or tuple of ints, optional

This PR adds the missing multi-axis support:

xt::xarray<int> a = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};
auto result = xt::roll(a, {1, 2}, {0, 1});  // roll 1 on axis 0, roll 2 on axis 1

Commits

1. `feat: add multi-axis support for xt::roll with optimized pointer arithmetic`

Implemented detail::roll_multi() using recursive pointer arithmetic algorithm extended from the existing single-axis implementation
Added two new xt::roll() overloads: one for containers (std::vector, std::array) and one for C-style arrays
Supports accumulated shifts when the same axis appears multiple times (NumPy-compatible behavior)
Supports negative axis indices
Added comprehensive unit tests covering various scenarios (2D/3D, negative shifts, different container types, column-major layout, etc.)

2. `docs: add multi-axis roll examples to documentation`

Updated docs/source/numpy.rst with NumPy-to-xtensor comparison for multi-axis roll
Updated docs/source/quickref/manipulation.rst with usage examples

3. `bench: add benchmark for multi-axis roll`

Added benchmark/benchmark_roll.cpp with comprehensive benchmarks comparing sequential single-axis rolls vs. the new multi-axis implementation
Benchmarks cover 2D to 5D tensors, various sizes, and realistic image processing scenarios (RGB 1080p to 8K)

Build & Test

# Run unit tests
mkdir build && cd build
cmake .. -DBUILD_TESTS=ON -DDOWNLOAD_GTEST=ON
make -j8 test_xmanipulation
./test/test_xmanipulation --test-case="*roll*" -s

# Run benchmarks
cd benchmark
mkdir build && cd build
cmake .. -DDOWNLOAD_GBENCHMARK=ON -Dxtensor_DIR=$(realpath ../../build)
make -j8 benchmark_xtensor
./benchmark_xtensor --benchmark_filter=roll

📊 Benchmark Results (click to expand)

Test Environment: Intel Core i7-12700H, 16GB RAM, Ubuntu 24.04.3 LTS

Run on (20 X 4600 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x10)
  L1 Instruction 32 KiB (x10)
  L2 Unified 1280 KiB (x10)
  L3 Unified 24576 KiB (x1)

---------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------
detail::roll_3d_3axes_sequential/128/r0.01       11735715 ns     11731715 ns           51 bytes_per_second=1.33Gi/s
detail::roll_3d_3axes_multi/128/r0.01             1184996 ns      1184922 ns          574 bytes_per_second=13.19Gi/s
detail::roll_3d_3axes_sequential/128/r0.3        12288778 ns     12288030 ns           56 bytes_per_second=1.27Gi/s
detail::roll_3d_3axes_multi/128/r0.3              1267342 ns      1267286 ns          548 bytes_per_second=12.33Gi/s
detail::roll_2d_sequential/64x64                     1841 ns         1841 ns       382473 bytes_per_second=16.57Gi/s
detail::roll_2d_multi/64x64                           854 ns          854 ns       821232 bytes_per_second=35.74Gi/s
detail::roll_2d_sequential/256x256                  30787 ns        30784 ns        22481 bytes_per_second=15.86Gi/s
detail::roll_2d_multi/256x256                       12725 ns        12724 ns        53445 bytes_per_second=38.37Gi/s
detail::roll_2d_sequential/1024x1024               963154 ns       963069 ns          736 bytes_per_second=8.11Gi/s
detail::roll_2d_multi/1024x1024                    419272 ns       419185 ns         1857 bytes_per_second=18.64Gi/s
detail::roll_3d_2axes_sequential/128x128x128     10321186 ns     10321178 ns           66 bytes_per_second=1.51Gi/s
detail::roll_3d_2axes_multi/128x128x128           1200420 ns      1200325 ns          561 bytes_per_second=13.02Gi/s
detail::roll_4d_4axes_sequential/16                 95978 ns        95957 ns         7380 bytes_per_second=5.09Gi/s
detail::roll_4d_4axes_multi/16                      26161 ns        26162 ns        26964 bytes_per_second=18.66Gi/s
detail::roll_4d_4axes_sequential/32              10419965 ns     10419994 ns           67 bytes_per_second=767.76Mi/s
detail::roll_4d_4axes_multi/32                     422750 ns       422731 ns         1665 bytes_per_second=18.48Gi/s
detail::roll_5d_5axes_sequential/16              11791085 ns     11790205 ns           61 bytes_per_second=678.53Mi/s
detail::roll_5d_5axes_multi/16                     508597 ns       508588 ns         1388 bytes_per_second=15.36Gi/s
detail::roll_5d_5axes_sequential/32             482957914 ns    482920008 ns            2 bytes_per_second=530.11Mi/s
detail::roll_5d_5axes_multi/32                  100002192 ns     99992129 ns            7 bytes_per_second=2.50Gi/s
detail::roll_3d_2axes_sequential/rgb_1080p       36946640 ns     36943762 ns           19 bytes_per_second=1.25Gi/s
detail::roll_3d_2axes_multi/rgb_1080p            17183395 ns     17182811 ns           39 bytes_per_second=2.70Gi/s
detail::roll_3d_2axes_sequential/rgb_4K         158555547 ns    158541395 ns            4 bytes_per_second=1.17Gi/s
detail::roll_3d_2axes_multi/rgb_4K               75991080 ns     75982624 ns            9 bytes_per_second=2.44Gi/s
detail::roll_3d_2axes_sequential/rgb_8K         606234407 ns    606221933 ns            1 bytes_per_second=1.22Gi/s
detail::roll_3d_2axes_multi/rgb_8K              294193380 ns    294178950 ns            2 bytes_per_second=2.52Gi/s

Performance Summary

The multi-axis implementation achieves significant speedups over sequential single-axis roll calls:

Scenario	Speedup
2D tensors	~2-3x
3D tensors (2-3 axes)	~2-10x
4D tensors (4 axes)	~4-25x
5D tensors (5 axes)	~5-20x
RGB images (1080p-8K)	~2x

The performance gain increases with the number of axes being rolled, as the multi-axis version avoids creating intermediate temporary arrays and performs the roll operation in a single pass through the data.

…hmetic

feat: add multi-axis support for xt::roll with optimized pointer arit…

4640852

…hmetic

f14XuanLv changed the title ~~Feature/roll multi axis~~ feat: add multi-axis support for xt::roll Dec 3, 2025

f14XuanLv changed the title ~~feat: add multi-axis support for xt::roll~~ feat: add multi-axis support for xt::roll (numpy.roll parity) Dec 3, 2025

f14XuanLv added 2 commits December 3, 2025 21:38

docs: add multi-axis roll examples to documentation

eec6188

bench: add benchmark for multi-axis roll

286aae0

f14XuanLv force-pushed the feature/roll-multi-axis branch from 43edec3 to 286aae0 Compare December 3, 2025 13:38

f14XuanLv mentioned this pull request Dec 5, 2025

fix: correct negative axis handling in roll function #2878

Open

4 tasks

f14XuanLv marked this pull request as draft December 5, 2025 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add multi-axis support for xt::roll (numpy.roll parity) #2877

feat: add multi-axis support for xt::roll (numpy.roll parity) #2877

Uh oh!

f14XuanLv commented Dec 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add multi-axis support for xt::roll (numpy.roll parity) #2877

Are you sure you want to change the base?

feat: add multi-axis support for xt::roll (numpy.roll parity) #2877

Uh oh!

Conversation

f14XuanLv commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Description

Commits

1. feat: add multi-axis support for xt::roll with optimized pointer arithmetic

2. docs: add multi-axis roll examples to documentation

3. bench: add benchmark for multi-axis roll

Build & Test

Performance Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

f14XuanLv commented Dec 3, 2025 •

edited

Loading

1. `feat: add multi-axis support for xt::roll with optimized pointer arithmetic`

2. `docs: add multi-axis roll examples to documentation`

3. `bench: add benchmark for multi-axis roll`