fast_transpose slower than naive_transpose

I've been running various DFT calculations with PSI4 inside Intel's VTune and `gg_fast_transpose` popped up a top hotspot As as test I exchanged it to `gg_naive_transpose` and saw a significant speed up (50% for a C60 test) for that function.

It's only 4-5% of the total CPU time for single-points, so no real bottleneck to worry about, but wondering why the blocked-transpose might be so much slower.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast_transpose slower than naive_transpose #62

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fast_transpose slower than naive_transpose #62

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions