Added implementation to calculate Gram and Row Gram matrices #7378

TwentyPast4 · 2025-11-28T11:40:38Z

Type

Bug fix (non-breaking change which fixes an issue): Fixes #
New feature (non-breaking change which adds functionality). Resolves #
Breaking change (fix or feature that would cause existing functionality to not work as expected) Resolves #

Motivation and Context

Gram and row Gram matrix computations (ie. A.T @ A and A @ A.T) are relatively common in some linear algebra (eg. least squares, linear independence, ML kernels, ...).
If you execute A.T().Matmul(A), at least one of the two matrices in the matmul operation will not be contiguous, meaning when matmul is executed, a copy operation will be performed which can be a noticeable performance loss.
Implementations of Gram() and RowGram() are done with a single matrix, with the transposition being done in gemm functions. This means that if A is contiguous, no copy will be performed, which is not true for A.T().Matmul(A).

Checklist:

I have run python util/check_style.py --apply to apply Open3D code style
to my code.
This PR changes Open3D behavior or adds new functionality.
- Both C++ (Doxygen) and Python (Sphinx / Google style) documentation is
  updated accordingly.
- I have added or updated C++ and / or Python unit tests OR included test
  results (e.g. screenshots or numbers) here.
I will follow up and update the code if CI fails.
For fork PRs, I have selected Allow edits from maintainers.

Description

Added Gram() and RowGram() functions to Tensor. These are intended for <=2D tensors, similar to how T() is implemented.

update-docs · 2025-11-28T11:40:42Z

Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes.

ssheorey · 2026-01-05T22:45:47Z

Hi @TwentyPast4, thanks for adding this interesting new operation. Where is this used in Open3D, or a 3D workflow? How much of a performance gain does it provide to that workflow?

TwentyPast4 · 2026-01-06T10:54:41Z

Hi @TwentyPast4, thanks for adding this interesting new operation. Where is this used in Open3D, or a 3D workflow? How much of a performance gain does it provide to that workflow?

The current use cases of this operation I could find are:

for Gram in RGBDOdometry testing, Ttrans.T().Matmul(Ttrans)
for RowGram in FilterGaussian, mask.Matmul(mask.T())

The workflow is solving least-squares problems, for example fitting a conic to point clouds - something useful when you are working with point clouds containing objects of a known shape, and wanting to measure properties of those shapes.
My specific use case is fitting a cone to point cloud scans of tree trunks.
Of course I would love to also include a full workflow with per-made code for fitting a conic (or other 3D shapes) to point cloud data, but I'm afraid I do not have the time for that at the moment.

The performance difference is the same for both Gram and Row gram - on my hardware it is a 6-7x speedup on CPU and 2x on GPU. (3.60 vs 22.25 seconds on CPU and 14.01 vs 24.37 seconds on GPU)
This is the perf test I ran for this data:

TEST_P(LinalgPermuteDevices, GramPerf) {
    core::Device device = GetParam();

    // Gram test.
    core::Tensor A = core::Tensor::Init<float>({{1, 2, 3}, {4, 5, 6}}, device);

    auto start = std::chrono::steady_clock::now();
    core::Tensor B;
    for (int i = 0; i < 1000000; ++i) {
        B = A.Gram();
    }
    auto after_gram = std::chrono::steady_clock::now();
    for (int i = 0; i < 1000000; ++i) {
        B = A.T().Matmul(A);
    }
    auto finish = std::chrono::steady_clock::now();

    double elapsed_gram = std::chrono::duration_cast<std::chrono::microseconds>(after_gram - start).count() * 1e-6;
    double elapsed_matmul = std::chrono::duration_cast<std::chrono::microseconds>(finish - after_gram).count() * 1e-6;
    EXPECT_LT(elapsed_gram, elapsed_matmul);
}

It should be noted that there may be compiler optimization specifics at play with this kind of benchmark, but it's probably ballpark-accurate.

Dan Toškan added 2 commits November 28, 2025 12:17

Added implementation for Gram and Row Gram matrices

10213f3

Applied style check

9131eb6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added implementation to calculate Gram and Row Gram matrices #7378

Added implementation to calculate Gram and Row Gram matrices #7378

TwentyPast4 commented Nov 28, 2025

Uh oh!

update-docs bot commented Nov 28, 2025

Uh oh!

ssheorey commented Jan 5, 2026 •

edited

Loading

Uh oh!

TwentyPast4 commented Jan 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added implementation to calculate Gram and Row Gram matrices #7378

Are you sure you want to change the base?

Added implementation to calculate Gram and Row Gram matrices #7378

Conversation

TwentyPast4 commented Nov 28, 2025

Type

Motivation and Context

Checklist:

Description

Uh oh!

update-docs bot commented Nov 28, 2025

Uh oh!

ssheorey commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TwentyPast4 commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ssheorey commented Jan 5, 2026 •

edited

Loading

TwentyPast4 commented Jan 6, 2026 •

edited

Loading