Skip to content
Open
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
1a81c71
reinterpretation
arnavk23 Nov 2, 2025
8c65fc4
documentation
arnavk23 Nov 2, 2025
fd56be7
rechanging the implementation
arnavk23 Nov 2, 2025
643a750
making changes
arnavk23 Nov 3, 2025
76a6e04
Update Project.toml
arnavk23 Nov 3, 2025
355f814
Merge branch 'master' into master
arnavk23 Nov 3, 2025
779f391
Import Base.reinterpret for improved functionality
arnavk23 Nov 3, 2025
d909fb1
h
arnavk23 Nov 3, 2025
17c6754
reimplementing tests
arnavk23 Nov 3, 2025
e081b7a
changes acc. to review
arnavk23 Nov 5, 2025
42ba400
running JuliaFormatter
arnavk23 Nov 5, 2025
78a3554
Merge branch 'master' into master
arnavk23 Nov 5, 2025
2d3938b
Update test/test_reinterpret.jl
arnavk23 Nov 14, 2025
7e5e5ee
adding review changes
arnavk23 Nov 14, 2025
4618431
Merge branch 'master' into master
arnavk23 Nov 14, 2025
61f3045
changes
arnavk23 Nov 15, 2025
0480dff
reinterpret
arnavk23 Nov 15, 2025
db8df1e
reinterpret.jl
arnavk23 Nov 15, 2025
71a0a33
test
arnavk23 Nov 15, 2025
fb502a4
documentation for fastrow and fastcolumn data structures
arnavk23 Dec 5, 2025
9517be5
Delete src/reinterpret.jl
arnavk23 Dec 5, 2025
f89befb
Delete test/test_reinterpret.jl
arnavk23 Dec 5, 2025
9800240
Update QuantumClifford.jl
arnavk23 Dec 5, 2025
664b1ce
Merge branch 'master' into issue/474-fastrow-fastcolumn-docs
arnavk23 Dec 5, 2025
f49d719
Merge branch 'master' into issue/474-fastrow-fastcolumn-docs
arnavk23 Dec 6, 2025
0db9d70
Merge branch 'master' into issue/474-fastrow-fastcolumn-docs
arnavk23 Dec 7, 2025
814cd49
Merge branch 'master' into issue/474-fastrow-fastcolumn-docs
arnavk23 Dec 25, 2025
c10984d
clarification: Update FastRow and FastColumn documentation
arnavk23 Dec 27, 2025
24ef384
Documentation improvements for fastrow/fastcolumn layout functions
arnavk23 Jan 4, 2026
27629c7
Merge branch 'master' into issue/474-fastrow-fastcolumn-docs
arnavk23 Jan 4, 2026
f7ca6ec
Clarify fastrow/fastcolumn documentation and state defaults
arnavk23 Jan 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 32 additions & 3 deletions docs/src/datastructures.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,36 @@ Notice the results when the projection operator commutes with the state but is n

We do not use boolean arrays to store information about the qubits as this would be wasteful (7 out of 8 bits in the boolean would be unused). Instead, we use all 8 qubits in a byte and perform bitwise logical operations as necessary. Implementation details of the object in RAM can matter for performance. The library permits any of the standard `UInt` types to be used for packing the bits, and larger `UInt` types (like `UInt64`) are usually faster as they permit working on 64 qubits at a time (instead of 1 if we used a boolean, or 8 if we used a byte).

Moreover, how a tableau is stored in memory can affect performance, as a row-major storage
usually permits more efficient use of the CPU cache (for the particular algorithms we use).
### Memory Layout: Row-Major vs Column-Major

Both of these parameters are [benchmarked](bench_intsize.png) (testing the application of a Pauli operator, which is an $\mathcal{O}(n^2)$ operation; and testing the canonicalization of a Stabilizer, which is an $\mathcal{O}(n^3)$ operation). Row-major UInt64 is the best performing and it is used by default in this library.
How a tableau is stored in memory significantly affects performance, as different memory layouts provide better cache locality for different operations.

The library uses **row-major (fastrow) layout by default**, where each Pauli string (row of the tableau) is stored contiguously in memory. This layout is optimized for:
- **Canonicalization operations** (`canonicalize!`) - $\mathcal{O}(n^3)$ operations
- **Projective measurements** (`project!`) - which frequently iterate over rows

The alternative **column-major (fastcolumn) layout** stores tableau columns (mostly) contiguously in memory. This layout is optimized for:
- **Applying sparse gates** like `apply!(s, sCNOT(i,j))` - row updates on a few qubits
- **Pauli multiplications** (left or right)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually have the data for this? From the GPU side, I would reckon that both styles should execute nearly equally as fast if implemented properly (assuming no phase information is desired, otherwise row-major would likely reign supreme)
Moreover, row-major multiplication is more straightforward to parallelise/vectorise/distribute/etc. as it eliminates any inter-task dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are all the benchmarks that exist (and they are pretty outdated and not auto-executed in CI):

https://qc.quantumsavory.org/stable/bench_intsize.png


#### Converting Between Layouts

The functions [`fastrow`](@ref) and [`fastcolumn`](@ref) can be used to convert between memory layouts without changing the logical content of the tableau:

```julia
s = random_stabilizer(1000) # Uses default fastrow layout
s_col = fastcolumn(copy(s)) # Convert to column-major layout
s_row = fastrow(copy(s_col)) # Convert back to row-major layout
```

These functions work on all stabilizer data structures: [`Stabilizer`](@ref), [`Destabilizer`](@ref), [`MixedStabilizer`](@ref), and [`MixedDestabilizer`](@ref).

#### Performance Implications

The default row-major (`fastrow`) layout is generally the best choice for typical operations on the CPU. However, if your code performs many sparse gate applications on a specific qubit set, converting to column-major layout may be beneficial.

**Note:** The performance claims above are based on CPU benchmarks. On GPU, the optimal memory layout may differ due to differences in memory access patterns and hardware architecture. Users interested in GPU performance are encouraged to benchmark both layouts for their workloads and to contribute results or suggestions.

The test suite (see e.g. `test/test_bitpack.jl`) only verifies that both memory layouts produce identical results for all operations; it does **not** compare their performance. Actual performance comparisons are performed using scripts in the `benchmark/` directory, which are designed to generate benchmark results suitable for automatic inclusion in the documentation. If you wish to contribute new benchmarks or update performance data, please refer to the scripts in `benchmark/`.

Both of these parameters are [benchmarked](bench_intsize.png) (testing the application of a Pauli operator, which is an $\mathcal{O}(n^2)$ operation; and testing the canonicalization of a Stabilizer, which is an $\mathcal{O}(n^3)$ operation) on CPU. Row-major UInt64 is the best performing and it is used by default in this library for CPU workloads.
31 changes: 31 additions & 0 deletions test/test_bitpack.jl
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,35 @@
@test stab_to_gf2(s) == stab_to_gf2(sr) == stab_to_gf2(sc) == stab_to_gf2(s8) == stab_to_gf2(s8r) == stab_to_gf2(s8c)
end
end

@testset "memory layout performance comparison" begin
# fastrow should be faster than fastcolumn for canonicalization
s_row = fastrow(random_stabilizer(100, 128))
s_col = fastcolumn(copy(s_row))

# Both layouts should produce identical results
result_row = canonicalize!(copy(s_row); phases=true)
result_col = canonicalize!(copy(s_col); phases=true)
@test stab_to_gf2(result_row) == stab_to_gf2(result_col)

# Test sparse gate application
s_row_gates = fastrow(random_stabilizer(50, 64))
s_col_gates = fastcolumn(copy(s_row_gates))
gate = sCNOT(1, 2)

# Apply sparse gates and verify identity
s_row_after = apply!(copy(s_row_gates), gate)
s_col_after = apply!(copy(s_col_gates), gate)
@test stab_to_gf2(s_row_after) == stab_to_gf2(s_col_after)

# Test dense clifford operator application
s_row_clif = fastrow(random_stabilizer(50, 64))
s_col_clif = fastcolumn(copy(s_row_clif))
c = CliffordOperator(random_destabilizer(64; phases=false))

# Apply dense gates and verify identity
s_row_clif_after = apply!(copy(s_row_clif), c)
s_col_clif_after = apply!(copy(s_col_clif), c)
@test stab_to_gf2(s_row_clif_after) == stab_to_gf2(s_col_clif_after)
end
end
Loading