Skip to content

Parallelization in [email protected], compared with [email protected] #236

@ZongYongyue

Description

@ZongYongyue

Hi Lukas,

I am exploring the new version of MPSKit. Compared with [email protected], the [email protected] seems to drop some support for parallel computation, especially for finite size system and algorithms.

For an example, set MPSKit.Defaults.set_parallelization("derivatives" => true) is useful when I perform DMRG2 for a finite size lattice in the past version:

#julia -t 8 hubbard_m.jl
using LinearAlgebra
BLAS.set_num_threads(1)
using TensorKit
using MPSKit
using MPSKitModels: FiniteCylinder,FiniteStrip
using DynamicalCorrelators
using JLD2: save,load

MPSKit.Defaults.set_parallelization("derivatives" => true)
filling = (1,1)
lattice = FiniteStrip(4, 12)
H = hubbard(Float64, U1Irrep, U1Irrep, lattice; filling=filling, t=1, U=8, μ=0)
N=length(lattice)
st = randFiniteMPS(Float64, U1Irrep, U1Irrep, N; filling=filling)
err = 1e-6
@time gs, envs, delta = find_groundstate(st, H, DMRG2(trscheme= truncerr(err)));
E0 = expectation_value(gs, H)

[ Info: DMRG2   1:      obj = -4.911146147560e+00       err = 9.9914573155e-01  time = 51.10 sec
[ Info: DMRG2   2:      obj = -4.913259207333e+00       err = 1.6855045657e-04  time = 1.29 min
[ Info: DMRG2   3:      obj = -4.913259209043e+00       err = 2.7500646205e-10  time = 24.01 sec
[ Info: DMRG2 conv 4:   obj = -4.913259209043e+00       err = 4.3298697960e-14  time = 3.27 min
209.001721 seconds (1.13 G allocations: 624.127 GiB, 21.34% gc time, 288 lock conflicts, 38.48% compilation time)
-4.913259209043462
# For a single thread, it costs
[ Info: DMRG2   1:      obj = -4.912078856370e+00       err = 9.9976380773e-01  time = 1.77 min
[ Info: DMRG2   2:      obj = -4.913259207169e+00       err = 1.0282147643e-04  time = 1.61 min
[ Info: DMRG2   3:      obj = -4.913259209043e+00       err = 3.0078417534e-10  time = 57.36 sec
[ Info: DMRG2 conv 4:   obj = -4.913259209043e+00       err = 4.4075854078e-14  time = 5.79 min
357.544613 seconds (914.24 M allocations: 572.318 GiB, 16.88% gc time, 4.63% compilation time)
-4.9132592090434555

But in the new version, a lot of things changed, and I noticed that the two-site derivative function ∂AC2 does not use multithreads anymore, which seems to drop the last support for DMRG2 in parallelization and that indeed seems to be the case:

# 8 threads in [email protected] with MPSKit.Defaults.set_scheduler!(:dynamic)
[ Info: DMRG2   1:      obj = -4.911500799784e+00       err = 9.4429522527e-01  time = 1.26 min
[ Info: DMRG2   2:      obj = -4.913259207112e+00       err = 2.2780024577e-04  time = 48.19 sec
[ Info: DMRG2   3:      obj = -4.913259209043e+00       err = 2.9917901490e-10  time = 35.59 sec
[ Info: DMRG2 conv 4:   obj = -4.913259209043e+00       err = 4.3964831775e-14  time = 4.68 min
291.332853 seconds (888.91 M allocations: 450.021 GiB, 11.08% gc time, 15.20% compilation time: <1% of which was recompilation)
-4.913259209043462
# single thread in [email protected]
[ Info: DMRG2   1:      obj = -4.911706979069e+00       err = 9.9925886189e-01  time = 1.06 min
[ Info: DMRG2   2:      obj = -4.913259207177e+00       err = 1.6968202792e-04  time = 52.09 sec
[ Info: DMRG2   3:      obj = -4.913259209043e+00       err = 3.0106028781e-10  time = 34.27 sec
[ Info: DMRG2 conv 4:   obj = -4.913259209043e+00       err = 4.4408920985e-14  time = 4.58 min
284.791112 seconds (762.70 M allocations: 432.346 GiB, 16.69% gc time, 8.58% compilation time: <1% of which was recompilation)
-4.913259209043463

For single-threaded computations, the new version has significant advantages. However, since the new version does not support multi-threading for finite size DMRG, it is at a disadvantage compared to the previous version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions