|
| 1 | +# Parallel Computing with OhMyThreads.jl |
| 2 | + |
| 3 | +SolarPosition.jl provides a parallel computing extension using [`OhMyThreads.jl`](https://github.com/JuliaFolds2/OhMyThreads.jl) |
| 4 | +for efficient multithreaded solar position calculations across large time series. This |
| 5 | +extension is particularly useful when processing thousands of timestamps, where |
| 6 | +parallelization can provide significant speedups. |
| 7 | + |
| 8 | +## Installation |
| 9 | + |
| 10 | +The OhMyThreads extension is loaded automatically when both [`SolarPosition.jl`](https://github.com/JuliaAstro/SolarPosition.jl) and [`OhMyThreads.jl`](https://github.com/JuliaFolds2/OhMyThreads.jl) |
| 11 | +are loaded: |
| 12 | + |
| 13 | +```julia |
| 14 | +using SolarPosition |
| 15 | +using OhMyThreads |
| 16 | +``` |
| 17 | + |
| 18 | +!!! note "Thread Configuration" |
| 19 | + Julia must be started with multiple threads to benefit from parallelization. Use |
| 20 | + `julia --threads=auto` or set the `JULIA_NUM_THREADS` environment variable. Check |
| 21 | + the number of available threads with `Threads.nthreads()`. |
| 22 | + |
| 23 | +## Quick Start |
| 24 | + |
| 25 | +The extension adds new methods to [`solar_position`](@ref) and [`solar_position!`](@ref) |
| 26 | +that accept an `OhMyThreads.Scheduler` as the last argument. These methods automatically |
| 27 | +parallelize computations across the provided timestamp vector. |
| 28 | + |
| 29 | +```@example parallel |
| 30 | +using SolarPosition |
| 31 | +using OhMyThreads |
| 32 | +using Dates |
| 33 | +using StructArrays |
| 34 | +
|
| 35 | +# Create observer location |
| 36 | +obs = Observer(51.5, -0.18, 15.0) # London |
| 37 | +
|
| 38 | +# Generate a year of minute timestamps |
| 39 | +times = collect(DateTime(2024, 1, 1):Minute(1):DateTime(2025, 1, 1)) |
| 40 | +
|
| 41 | +# Parallel computation with DynamicScheduler |
| 42 | +t0 = time() |
| 43 | +positions = solar_position(obs, times, PSA(), NoRefraction(), DynamicScheduler()) |
| 44 | +dt_parallel = time() - t0 |
| 45 | +println("Time taken (parallel): $(round(dt_parallel, digits=5)) seconds") |
| 46 | +``` |
| 47 | + |
| 48 | +Now we compare this to the serial version: |
| 49 | + |
| 50 | +```@example parallel |
| 51 | +# Serial computation (no scheduler argument) |
| 52 | +t0 = time() |
| 53 | +positions_serial = solar_position(obs, times, PSA(), NoRefraction()) |
| 54 | +dt_serial = time() - t0 |
| 55 | +println("Time taken (serial): $(round(dt_serial, digits=5)) seconds") |
| 56 | +``` |
| 57 | + |
| 58 | +We observe a speedup of: |
| 59 | + |
| 60 | +```@example parallel |
| 61 | +speedup = dt_serial / dt_parallel |
| 62 | +println("Speedup: $(round(speedup, digits=2))×") |
| 63 | +``` |
| 64 | + |
| 65 | +### Simplified Syntax |
| 66 | + |
| 67 | +You can also use the simplified syntax with the scheduler as the third argument, which |
| 68 | +uses the default algorithm (PSA) and no refraction correction: |
| 69 | + |
| 70 | +```@example parallel |
| 71 | +# Simplified syntax with default algorithm |
| 72 | +positions = solar_position(obs, times, DynamicScheduler()) |
| 73 | +@show first(positions, 3) |
| 74 | +``` |
| 75 | + |
| 76 | +## Available Schedulers |
| 77 | + |
| 78 | +OhMyThreads.jl provides different scheduling strategies optimized for various workload |
| 79 | +characteristics: |
| 80 | + |
| 81 | +### DynamicScheduler |
| 82 | + |
| 83 | +The [`DynamicScheduler`](https://juliafolds2.github.io/OhMyThreads.jl/stable/refs/api/#OhMyThreads.DynamicScheduler) |
| 84 | +is the default and recommended scheduler for most workloads. It dynamically balances |
| 85 | +tasks among threads, making it suitable for non-uniform workloads where computation |
| 86 | +times may vary. Please visit the `OhMyThreads.jl` documentation for more details. |
| 87 | + |
| 88 | +```@example parallel |
| 89 | +# Dynamic scheduling (recommended) |
| 90 | +positions = solar_position(obs, times, PSA(), NoRefraction(), DynamicScheduler()); |
| 91 | +nothing # hide |
| 92 | +``` |
| 93 | + |
| 94 | +### StaticScheduler |
| 95 | + |
| 96 | +The [`StaticScheduler`](https://juliafolds2.github.io/OhMyThreads.jl/stable/refs/api/#OhMyThreads.StaticScheduler) |
| 97 | +partitions work statically among threads. This can be more efficient for uniform |
| 98 | +workloads where all computations take approximately the same time. |
| 99 | + |
| 100 | +```@example parallel |
| 101 | +# Static scheduling for uniform workloads |
| 102 | +positions = solar_position(obs, times, PSA(), NoRefraction(), StaticScheduler()) |
| 103 | +nothing # hide |
| 104 | +``` |
| 105 | + |
| 106 | +## In-Place Computation |
| 107 | + |
| 108 | +For maximum performance and minimal allocations, use the in-place version |
| 109 | +[`solar_position!`](@ref) with a pre-allocated [`StructVector`](https://github.com/JuliaArrays/StructArrays.jl): |
| 110 | + |
| 111 | +```@example parallel |
| 112 | +using StructArrays |
| 113 | +
|
| 114 | +# Pre-allocate output array |
| 115 | +positions = StructVector{SolPos{Float64}}(undef, length(times)) |
| 116 | +
|
| 117 | +# Compute in-place |
| 118 | +solar_position!(positions, obs, times, PSA(), NoRefraction(), DynamicScheduler()) |
| 119 | +nothing # hide |
| 120 | +``` |
| 121 | + |
| 122 | +The in-place version avoids allocating the output array and minimizes intermediate |
| 123 | +allocations, making it ideal for repeated computations or memory-constrained |
| 124 | +environments. |
| 125 | + |
| 126 | +## Performance Comparison |
| 127 | + |
| 128 | +Here's a typical performance comparison between serial and parallel execution: |
| 129 | + |
| 130 | +```julia |
| 131 | +using BenchmarkTools |
| 132 | + |
| 133 | +### Serial execution (no scheduler argument) |
| 134 | +@benchmark solar_position($obs, $times, PSA(), NoRefraction()) |
| 135 | +# BenchmarkTools.Trial: 57 samples with 1 evaluation per sample. |
| 136 | +# Range (min … max): 83.994 ms … 98.110 ms ┊ GC (min … max): 0.00% … 12.50% |
| 137 | +# Time (median): 87.907 ms ┊ GC (median): 0.66% |
| 138 | +# Time (mean ± σ): 88.194 ms ± 2.478 ms ┊ GC (mean ± σ): 1.39% ± 2.23% |
| 139 | + |
| 140 | +# ▁ █ |
| 141 | +# ▆▁▄▁▁▁▇▆▆▁▁▇▄█▄▁▇▆▇▇▄▄▁▄▄▆▆█▆▆▄▁▆▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁ |
| 142 | +# 84 ms Histogram: frequency by time 95.8 ms < |
| 143 | + |
| 144 | +# Memory estimate: 12.06 MiB, allocs estimate: 9. |
| 145 | + |
| 146 | +### Parallel execution with DynamicScheduler |
| 147 | +@benchmark solar_position($obs, $times, PSA(), NoRefraction(), DynamicScheduler()) |
| 148 | +# BenchmarkTools.Trial: 312 samples with 1 evaluation per sample. |
| 149 | +# Range (min … max): 7.588 ms … 35.575 ms ┊ GC (min … max): 0.00% … 74.79% |
| 150 | +# Time (median): 14.718 ms ┊ GC (median): 6.16% |
| 151 | +# Time (mean ± σ): 16.026 ms ± 6.387 ms ┊ GC (mean ± σ): 23.51% ± 19.37% |
| 152 | + |
| 153 | +# ▆▆█▁▃▂▅ ▁▁▁▄▁▃▅▁ ▁▄ ▁ |
| 154 | +# █▇███████▄████████▇▄██▆█▇▆▇█▅▆▅▆▄▄▁▅▄▁▁▄▃▄▅▄▃▃▄▃▄▆▃▃▁▄▄▁▃▁▃ ▄ |
| 155 | +# 7.59 ms Histogram: frequency by time 34.4 ms < |
| 156 | + |
| 157 | +# Memory estimate: 66.59 MiB, allocs estimate: 468. |
| 158 | + |
| 159 | +### In-place parallel execution |
| 160 | +pos = StructVector{SolPos{Float64}}(undef, length(times)) |
| 161 | +@benchmark solar_position!($pos, $obs, $times, PSA(), NoRefraction(), DynamicScheduler()) |
| 162 | +# BenchmarkTools.Trial: 908 samples with 1 evaluation per sample. |
| 163 | +# Range (min … max): 4.061 ms … 7.846 ms ┊ GC (min … max): 0.00% … 0.00% |
| 164 | +# Time (median): 5.532 ms ┊ GC (median): 0.00% |
| 165 | +# Time (mean ± σ): 5.501 ms ± 644.881 μs ┊ GC (mean ± σ): 0.00% ± 0.00% |
| 166 | + |
| 167 | +# ▃▅█▄▂▂ ▁ |
| 168 | +# ▃▄▅▃▄▃▃▅▃▄▃▁▃▂▃▄▂▃▂▂▃▂▂▃▂▅▃▃████████▆▅▅▄▆▅██▅▅▆▄▅▄▃▂▃▃▃▃▃▂▂ ▃ |
| 169 | +# 4.06 ms Histogram: frequency by time 6.72 ms < |
| 170 | + |
| 171 | +# Memory estimate: 20.47 KiB, allocs estimate: 284. |
| 172 | + |
| 173 | +### In-place parallel execution with StaticScheduler |
| 174 | +@benchmark solar_position!($pos, $obs, $times, PSA(), NoRefraction(), StaticScheduler()) |
| 175 | +# BenchmarkTools.Trial: 902 samples with 1 evaluation per sample. |
| 176 | +# Range (min … max): 4.027 ms … 7.228 ms ┊ GC (min … max): 0.00% … 0.00% |
| 177 | +# Time (median): 5.842 ms ┊ GC (median): 0.00% |
| 178 | +# Time (mean ± σ): 5.537 ms ± 802.636 μs ┊ GC (mean ± σ): 0.00% ± 0.00% |
| 179 | + |
| 180 | +# ▃▁ ▁▄▃ ▁ ▁▅▆█▂▄▇▇▅▁▃ ▁ |
| 181 | +# ██▃▃▅███▅▆▆▄▄▃▃▃▃▃▂▄▅▂▁▃▆▃▃▂▅▃▅█▇▆▂▃▅▅██████████████▇█▆▇▇▂▂ ▄ |
| 182 | +# 4.03 ms Histogram: frequency by time 6.72 ms < |
| 183 | + |
| 184 | +# Memory estimate: 15.97 KiB, allocs estimate: 220. |
| 185 | +``` |
| 186 | + |
| 187 | +On a system with 32 threads processing 527,041 timestamps (one year, minutely): |
| 188 | + |
| 189 | +| Method | Time | Speedup | Allocations | |
| 190 | +|--------|------|---------|-------------| |
| 191 | +| Serial | 87.9 ms | 1.0× | 12.06 MiB | |
| 192 | +| Parallel (DynamicScheduler) | 14.7 ms | **6.0×** | 66.59 MiB | |
| 193 | +| In-place (DynamicScheduler) | 5.53 ms | **15.9×** | 20.47 KiB | |
| 194 | +| In-place (StaticScheduler) | 5.84 ms | **15.0×** | 15.97 KiB | |
| 195 | + |
| 196 | +!!! tip "Performance Tips" |
| 197 | + For the best performance: |
| 198 | + - Use [`solar_position!`](@ref) with pre-allocated output for minimal allocations |
| 199 | + - Use `DynamicScheduler()` for most workloads |
| 200 | + - Ensure Julia is running with multiple threads (e.g., `--threads=auto`) |
| 201 | + - Process larger batches of timestamps to amortize threading overhead |
| 202 | + |
| 203 | +## Working with Different Time Types |
| 204 | + |
| 205 | +The parallel methods work with both [`DateTime`](https://docs.julialang.org/en/v1/stdlib/Dates/#Dates.DateTime) and [`ZonedDateTime`](https://juliatime.github.io/TimeZones.jl/stable/types/#TimeZones.ZonedDateTime): |
| 206 | + |
| 207 | +```@example parallel |
| 208 | +using TimeZones |
| 209 | +
|
| 210 | +# Using ZonedDateTime (avoiding DST transitions) |
| 211 | +tz = tz"Europe/London" |
| 212 | +# Use a subset of times to avoid DST transition issues in documentation |
| 213 | +summer_times = collect(DateTime(2024, 6, 1):Hour(1):DateTime(2024, 7, 1)) |
| 214 | +zoned_times = ZonedDateTime.(summer_times, tz) |
| 215 | +
|
| 216 | +# Parallel computation with time zone aware timestamps |
| 217 | +zoned_positions = solar_position(obs, zoned_times, PSA(), NoRefraction(), DynamicScheduler()) |
| 218 | +
|
| 219 | +println("Computed $(length(zoned_positions)) positions with time zone awareness") |
| 220 | +``` |
| 221 | + |
| 222 | +## Algorithm Comparison |
| 223 | + |
| 224 | +The parallel interface works with all solar position algorithms: |
| 225 | + |
| 226 | +```@example parallel |
| 227 | +# Test different algorithms in parallel |
| 228 | +algorithms = [PSA(), NOAA(), SPA()] |
| 229 | +
|
| 230 | +for alg in algorithms |
| 231 | + pos = solar_position(obs, times[1:100], alg, NoRefraction(), DynamicScheduler()) |
| 232 | + println("$(typeof(alg).name.name): azimuth=$(round(pos.azimuth[50], digits=5))°") |
| 233 | +end |
| 234 | +``` |
| 235 | + |
| 236 | +## Refraction Correction |
| 237 | + |
| 238 | +Atmospheric refraction corrections can be applied in parallel computations: |
| 239 | + |
| 240 | +```@example parallel |
| 241 | +# Parallel computation with Bennett refraction correction |
| 242 | +positions_refracted = solar_position( |
| 243 | + obs, |
| 244 | + times, |
| 245 | + PSA(), |
| 246 | + BENNETT(), |
| 247 | + DynamicScheduler() |
| 248 | +) |
| 249 | +
|
| 250 | +println("First position with refraction:") |
| 251 | +println(" Apparent elevation: $(round(positions_refracted.apparent_elevation[1], digits=2))°") |
| 252 | +``` |
| 253 | + |
| 254 | +## Implementation Details |
| 255 | + |
| 256 | +The extension uses OhMyThreads' [`tmap`](https://juliafolds2.github.io/OhMyThreads.jl/stable/refs/api/#OhMyThreads.tmap) |
| 257 | +and [`tmap!`](https://juliafolds2.github.io/OhMyThreads.jl/stable/refs/api/#OhMyThreads.tmap!) |
| 258 | +for task-based parallelism. Each timestamp is processed independently, making the |
| 259 | +computation embarrassingly parallel with no inter-thread communication required. |
| 260 | + |
| 261 | +The results from `tmap` are automatically converted to a [`StructVector`](https://github.com/JuliaArrays/StructArrays.jl) |
| 262 | +for efficient columnar storage compatible with the rest of SolarPosition.jl's API. |
| 263 | + |
| 264 | +## See Also |
| 265 | + |
| 266 | +- [Solar Positioning](@ref solar-positioning-algorithms) - Available positioning algorithms |
| 267 | +- [Refraction Correction](@ref refraction-correction) - Atmospheric refraction methods |
| 268 | +- [OhMyThreads.jl Documentation](https://juliafolds2.github.io/OhMyThreads.jl/stable/) - Task-based parallelism framework |
| 269 | +- [Julia Threading Documentation](https://docs.julialang.org/en/v1/manual/multi-threading/) - Julia's threading capabilities |
0 commit comments