|
| 1 | +# [Intensive parallelization on a Cluster](@id doc-tutor:Intensive-parallelization-on-a-Cluster) |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +In this tutorial, we will demonstrate how to seamlessly perform intensive parallelization on a cluster using the **QuantumToolbox.jl** package. Indeed, thanks to the [**Distributed.jl**](https://docs.julialang.org/en/v1/manual/distributed-computing/) and [**ClusterManagers.jl**](https://github.com/JuliaParallel/ClusterManagers.jl) packages, it is possible to parallelize on a cluster with minimal effort. The following examples are applied to a cluster with the [SLURM](https://slurm.schedmd.com/documentation.html) workload manager, but the same principles can be applied to other workload managers, as the [**ClusterManagers.jl**](https://github.com/JuliaParallel/ClusterManagers.jl) package is very versatile. |
| 6 | + |
| 7 | +We will consider two examples: |
| 8 | + |
| 9 | +1. **Parallelization of a Monte Carlo quantum trajectories** |
| 10 | +2. **Parallelization of a Master Equation by sweeping over parameters** |
| 11 | + |
| 12 | +### Monte Carlo Quantum Trajectories |
| 13 | + |
| 14 | +Let's consider a 2-dimensional transferse field Ising model with 4x3 spins. The Hamiltonian is given by |
| 15 | + |
| 16 | +```math |
| 17 | +\hat{H} = \frac{J_z}{2} \sum_{\langle i,j \rangle} \hat{\sigma}_i^z \hat{\sigma}_j^z + h_x \sum_i \hat{\sigma}_i^x \, , |
| 18 | +``` |
| 19 | + |
| 20 | +where the sums are over nearest neighbors, and the collapse operators are given by |
| 21 | + |
| 22 | +```math |
| 23 | +\hat{c}_i = \sqrt{\gamma} \hat{\sigma}_i^- \, . |
| 24 | +``` |
| 25 | + |
| 26 | +We start by creating a file named `run.batch` with the following content: |
| 27 | + |
| 28 | +```bash |
| 29 | +#!/bin/bash |
| 30 | +#SBATCH --job-name=tutorial |
| 31 | +#SBATCH --output=output.out |
| 32 | +#SBATCH --account=your_account |
| 33 | +#SBATCH --nodes=10 |
| 34 | +#SBATCH --ntasks-per-node=1 |
| 35 | +#SBATCH --cpus-per-task=72 |
| 36 | +#SBATCH --mem=128GB |
| 37 | +#SBATCH --time=0:10:00 |
| 38 | +#SBATCH --qos=parallel |
| 39 | + |
| 40 | +# Set PATH to include the directory of your custom Julia installation |
| 41 | +export PATH=/home/username/.juliaup/bin:$PATH |
| 42 | + |
| 43 | +# Now run Julia |
| 44 | +julia --project script.jl |
| 45 | +``` |
| 46 | + |
| 47 | +where we have to replace `your_account` with the name of your account. This script will be used to submit the job to the cluster. Here, we are requesting 10 nodes with 72 threads each (720 parallel jobs). The `--time` flag specifies the maximum time that the job can run. To see all the available options, you can check the [SLURM documentation](https://slurm.schedmd.com/documentation.html). We also export the path to the custom Julia installation, which is necessary to run the script (replace `username` with your username). Finally, we run the script `script.jl` with the command `julia --project script.jl`. |
| 48 | + |
| 49 | +The `script.jl` contains the following content: |
| 50 | + |
| 51 | +```julia |
| 52 | +using Distributed |
| 53 | +using ClusterManagers |
| 54 | + |
| 55 | +const SLURM_NUM_TASKS = parse(Int, ENV["SLURM_NTASKS"]) |
| 56 | +const SLURM_CPUS_PER_TASK = parse(Int, ENV["SLURM_CPUS_PER_TASK"]) |
| 57 | + |
| 58 | +exeflags = ["--project=.", "-t $SLURM_CPUS_PER_TASK"] |
| 59 | +addprocs(SlurmManager(SLURM_NUM_TASKS); exeflags=exeflags, topology=:master_worker) |
| 60 | + |
| 61 | + |
| 62 | +println("################") |
| 63 | +println("Hello! You have $(nworkers()) workers with $(remotecall_fetch(Threads.nthreads, 2)) threads each.") |
| 64 | + |
| 65 | +println("----------------") |
| 66 | + |
| 67 | + |
| 68 | +println("################") |
| 69 | + |
| 70 | +flush(stdout) |
| 71 | + |
| 72 | +@everywhere begin |
| 73 | + using QuantumToolbox |
| 74 | + using OrdinaryDiffEq |
| 75 | + |
| 76 | + BLAS.set_num_threads(1) |
| 77 | +end |
| 78 | + |
| 79 | +# Define lattice |
| 80 | + |
| 81 | +Nx = 4 |
| 82 | +Ny = 3 |
| 83 | +latt = Lattice(Nx = Nx, Ny = Ny) |
| 84 | + |
| 85 | +# Define Hamiltonian and collapse operators |
| 86 | +Jx = 0.0 |
| 87 | +Jy = 0.0 |
| 88 | +Jz = 1.0 |
| 89 | +hx = 0.2 |
| 90 | +hy = 0.0 |
| 91 | +hz = 0.0 |
| 92 | +γ = 1 |
| 93 | + |
| 94 | +Sx = mapreduce(i->SingleSiteOperator(sigmax(), i, latt), +, 1:latt.N) |
| 95 | +Sy = mapreduce(i->SingleSiteOperator(sigmay(), i, latt), +, 1:latt.N) |
| 96 | +Sz = mapreduce(i->SingleSiteOperator(sigmaz(), i, latt), +, 1:latt.N) |
| 97 | + |
| 98 | +H, c_ops = DissipativeIsing(Jx, Jy, Jz, hx, hy, hz, γ, latt; boundary_condition = Val(:periodic_bc), order = 1) |
| 99 | +e_ops = [Sx, Sy, Sz] |
| 100 | + |
| 101 | +# Time Evolution |
| 102 | + |
| 103 | +ψ0 = fock(2^latt.N, 0, dims = ntuple(i->2, Val(latt.N))) |
| 104 | + |
| 105 | +tlist = range(0, 10, 100) |
| 106 | + |
| 107 | +sol_mc = mcsolve(H, ψ0, tlist, c_ops, e_ops=e_ops, ntraj=5000, ensemble_method=EnsembleSplitThreads()) |
| 108 | + |
| 109 | +## |
| 110 | + |
| 111 | +println("FINISH!") |
| 112 | + |
| 113 | +rmprocs(workers()) |
| 114 | +``` |
| 115 | + |
| 116 | +In this script, we first load the necessary packages for distributed computing on the cluster (`Distributed.jl` and `ClusterManagers.jl`). Thanks to the environment variables (previously defined in the SLURM script), we can define the number of tasks and the number of CPUs per task. Then, we initialize the distributed network with the `addprocs(SlurmManager(SLURM_NUM_TASKS); exeflags=exeflags, topology=:master_worker)` command. We then import the packages with the `@everywhere` macro, meaning to load them in all the workers. Moreover, in order to avoid conflicts between the multithreading of the BLAS library and the native Julia multithreading, we set the number of threads of the BLAS library to 1 with the `BLAS.set_num_threads(1)` command. More information about this can be found [here](https://docs.julialang.org/en/v1/manual/performance-tips/#man-multithreading-linear-algebra). |
| 117 | + |
| 118 | +With the |
| 119 | + |
| 120 | +```julia |
| 121 | +println("Hello! You have $(nworkers()) workers with $(remotecall_fetch(Threads.nthreads, 2)) threads each.") |
| 122 | +``` |
| 123 | + |
| 124 | +command, we test that the distributed network is correctly initialized. The `remotecall_fetch(Threads.nthreads, 2)` command returns the number of threads of the worker with ID 2. |
| 125 | + |
| 126 | +We then write the main part of the script, where we define the lattice through the [`Lattice`](@ref) function. We set the parameters and define the Hamiltonian and collapse operators with the [`DissipativeIsing`](@ref) function. We also define the expectation operators `e_ops` and the initial state `ψ0`. Finally, we perform the Monte Carlo quantum trajectories with the [`mcsolve`](@ref) function. The `ensemble_method=EnsembleSplitThreads()` argument is used to parallelize the Monte Carlo quantum trajectories, by splitting the ensemble of trajectories among the workers. For a more detailed explanation of the different ensemble methods, you can check the [official documentation](https://docs.sciml.ai/DiffEqDocs/stable/features/ensemble/) of the [**DifferentialEquations.jl**](https://github.com/SciML/DifferentialEquations.jl/) package. Finally, the `rmprocs(workers())` command is used to remove the workers after the computation is finished. |
| 127 | + |
| 128 | +The output of the script will be printed in the `output.out` file, which contains an output similar to the following: |
| 129 | + |
| 130 | +``` |
| 131 | +################ |
| 132 | +Hello! You have 10 workers with 72 threads each. |
| 133 | +---------------- |
| 134 | +################ |
| 135 | +
|
| 136 | +Progress: [==============================] 100.0% --- Elapsed Time: 0h 00m 21s (ETA: 0h 00m 00s) |
| 137 | +
|
| 138 | +FINISH! |
| 139 | +``` |
| 140 | + |
| 141 | +where we can see that the computation **lasted only 21 seconds**. |
| 142 | + |
| 143 | +### Master Equation by Sweeping Over Parameters |
| 144 | + |
| 145 | +In this example, we will consider a driven Jaynes-Cummings model, describing a two-level atom interacting with a driven cavity mode. The Hamiltonian is given by |
| 146 | + |
| 147 | +```math |
| 148 | +\hat{H} = \omega_c \hat{a}^\dagger \hat{a} + \frac{\omega_q}{2} \hat{\sigma}_z + g (\hat{a} \hat{\sigma}_+ + \hat{a}^\dagger \hat{\sigma}_-) + F \cos(\omega_d t) (\hat{a} + \hat{a}^\dagger) \, , |
| 149 | +``` |
| 150 | + |
| 151 | +and the collapse operators are given by |
| 152 | + |
| 153 | +```math |
| 154 | +\hat{c}_1 = \sqrt{\gamma} \hat{a} \, , \quad \hat{c}_2 = \sqrt{\gamma} \hat{\sigma}_- \, . |
| 155 | +``` |
| 156 | + |
| 157 | +The SLURM file is the same as before, but the `script.jl` file now contains the following content: |
| 158 | + |
| 159 | +```julia |
| 160 | +using Distributed |
| 161 | +using ClusterManagers |
| 162 | + |
| 163 | +const SLURM_NUM_TASKS = parse(Int, ENV["SLURM_NTASKS"]) |
| 164 | +const SLURM_CPUS_PER_TASK = parse(Int, ENV["SLURM_CPUS_PER_TASK"]) |
| 165 | + |
| 166 | +exeflags = ["--project=.", "-t $SLURM_CPUS_PER_TASK"] |
| 167 | +addprocs(SlurmManager(SLURM_NUM_TASKS); exeflags=exeflags, topology=:master_worker) |
| 168 | + |
| 169 | + |
| 170 | +println("################") |
| 171 | +println("Hello! You have $(nworkers()) workers with $(remotecall_fetch(Threads.nthreads, 2)) threads each.") |
| 172 | + |
| 173 | +println("----------------") |
| 174 | + |
| 175 | + |
| 176 | +println("################") |
| 177 | + |
| 178 | +flush(stdout) |
| 179 | + |
| 180 | +@everywhere begin |
| 181 | + using QuantumToolbox |
| 182 | + using OrdinaryDiffEq |
| 183 | + |
| 184 | + BLAS.set_num_threads(1) |
| 185 | +end |
| 186 | + |
| 187 | +@everywhere begin |
| 188 | + const Nc = 20 |
| 189 | + const ωc = 1.0 |
| 190 | + const g = 0.05 |
| 191 | + const γ = 0.01 |
| 192 | + const F = 0.01 |
| 193 | + |
| 194 | + const a = tensor(destroy(Nc), qeye(2)) |
| 195 | + |
| 196 | + const σm = tensor(qeye(Nc), sigmam()) |
| 197 | + const σp = tensor(qeye(Nc), sigmap()) |
| 198 | + |
| 199 | + H(ωq) = ωc * a' * a + ωq * tensor(num(Nc), qeye(2)) + g * (a' * σm + a * σp) |
| 200 | + |
| 201 | + coef(p, t) = p.F * cos(p.ωd * t) # coefficient for the time-dependent term |
| 202 | + |
| 203 | + const c_ops = [sqrt(γ) * a, sqrt(γ) * σm] |
| 204 | + const e_ops = [a' * a] |
| 205 | +end |
| 206 | + |
| 207 | +# Define the ODE problem and the EnsembleProblem generation function |
| 208 | + |
| 209 | +@everywhere begin |
| 210 | + ωq_list = range(ωc - 3*g, ωc + 3*g, 100) |
| 211 | + ωd_list = range(ωc - 3*g, ωc + 3*g, 100) |
| 212 | + |
| 213 | + const iter = collect(Iterators.product(ωq_list, ωd_list)) |
| 214 | + |
| 215 | + function my_prob_func(prob, i, repeat, channel) |
| 216 | + ωq, ωd = iter[i] |
| 217 | + H_i = H(ωq) |
| 218 | + H_d_i = H_i + QobjEvo(a + a', coef) # Hamiltonian with a driving term |
| 219 | + |
| 220 | + L = liouvillian(H_d_i, c_ops).data # Make the Liouvillian |
| 221 | + |
| 222 | + put!(channel, true) # Update the progress bar channel |
| 223 | + |
| 224 | + remake(prob, f=L, p=(F = F, ωd = ωd)) |
| 225 | + end |
| 226 | +end |
| 227 | + |
| 228 | +ωq, ωd = iter[1] |
| 229 | +H0 = H(ωq) + QobjEvo(a + a', coef) |
| 230 | +ψ0 = tensor(fock(Nc, 0), basis(2, 1)) # Ground State |
| 231 | +tlist = range(0, 20 / γ, 1000) |
| 232 | + |
| 233 | +prob = mesolveProblem(H0, ψ0, tlist, c_ops, e_ops=e_ops, progress_bar=Val(false), params=(F = F, ωd = ωd)) |
| 234 | + |
| 235 | +### Just to print the progress bar |
| 236 | +progr = ProgressBar(length(iter)) |
| 237 | +progr_channel::RemoteChannel{Channel{Bool}} = RemoteChannel(() -> Channel{Bool}(1)) |
| 238 | +### |
| 239 | +ens_prob = EnsembleProblem(prob.prob, prob_func=(prob, i, repeat) -> my_prob_func(prob, i, repeat, progr_channel)) |
| 240 | + |
| 241 | + |
| 242 | +@sync begin |
| 243 | + @async while take!(progr_channel) |
| 244 | + next!(progr) |
| 245 | + end |
| 246 | + |
| 247 | + @async begin |
| 248 | + sol = solve(ens_prob, Tsit5(), EnsembleSplitThreads(), trajectories = length(iter)) |
| 249 | + put!(progr_channel, false) |
| 250 | + end |
| 251 | +end |
| 252 | +``` |
| 253 | + |
| 254 | +We are using the [`mesolveProblem`](@ref) function to define the master equation problem. We added some code to manage the progress bar, which is updated through a `RemoteChannel`. The `prob_func` argument of the `EnsembleProblem` function is used to define the function that generates the problem for each iteration. The `iter` variable contains the product of the `ωq_list` and `ωd_list` lists, which are used to sweep over the parameters. The `sol = solve(ens_prob, Tsit5(), EnsembleDistributed(), trajectories=length(iter))` command is used to solve the problem with the distributed ensemble method. The output of the script will be printed in the `output.out` file, which contains an output similar to the previous example. |
| 255 | + |
| 256 | + |
0 commit comments