Skip to content

Commit 5e9775c

Browse files
authored
Updated sorcha docs on Parallelization (#1183)
* Updated sorcha docs on Parallelization In multi_sorcha.py, fixed the order of parameters config and stats in the function run_sorcha, and removed path_inputs as it wasn't being used. Also updated the code so that --norbits is no longer required. Now the code will split the ObjIDs provided in the input file and --chunksize into different cores as equally as possible using the np.array_split function. Updated multi_sorcha.sh and hpc.rst to match these changes to multi_sorcha.py * Used black
1 parent 7e52352 commit 5e9775c

File tree

3 files changed

+13
-16
lines changed

3 files changed

+13
-16
lines changed

docs/example_files/multi_sorcha.py

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,9 @@
33
from multiprocessing import Pool
44
import pandas as pd
55
import sqlite3
6+
import numpy as np
67

7-
def run_sorcha(i, args, path_inputs, pointings, instance,stats, config):
8+
def run_sorcha(i, args, pointings, instance, config, stats):
89
print(f"sorcha run -c {config} --pd {pointings} -o {args.path}{instance}/ -t {instance}_{i} --ob {args.path}{instance}/orbits_{i}.csv -p {args.path}{instance}/physical_{i}.csv --st {stats}_{i}", flush=True)
910
os.system(f"sorcha run -c {config} --pd {pointings} -o {args.path}{instance}/ -t {instance}_{i} --ob {args.path}{instance}/orbits_{i}.csv -p {args.path}{instance}/physical_{i}.csv --st {stats}_{i}")
1011

@@ -16,43 +17,39 @@ def run_sorcha(i, args, path_inputs, pointings, instance,stats, config):
1617
parser.add_argument('--input_physical', type=str)
1718
parser.add_argument('--path', type=str)
1819
parser.add_argument('--chunksize', type=int)
19-
parser.add_argument('--norbits', type=int)
2020
parser.add_argument('--cores', type=int)
2121
parser.add_argument('--instance', type=int)
2222
parser.add_argument('--cleanup', action='store_true')
23-
parser.add_argument('--copy_inputs', action='store_true')
2423
parser.add_argument('--pointings', type=str)
2524
parser.add_argument('--stats', type=str)
2625
parser.add_argument('--config', type=str)
2726
args = parser.parse_args()
2827
chunk = args.chunksize
2928
instance = args.instance
30-
norbits = args.norbits
3129
pointings = args.pointings
3230
path = args.path
3331
config = args.config
3432
stats=args.stats
3533

3634
orbits = tb.Table.read(args.input_orbits)
3735
orbits = orbits[instance*chunk:(instance+1)*chunk]
36+
orb_splits = np.array_split(range(len(orbits)), args.cores)
37+
3838
physical = tb.Table.read(args.input_physical)
3939
physical = physical[instance*chunk:(instance+1)*chunk]
40+
phys_splits = np.array_split(range(len(physical)), args.cores)
4041

4142
os.system(f'mkdir {instance}')
42-
43-
44-
if args.copy_inputs:
45-
os.system(f'cp {pointings} {instance}/')
46-
path_inputs = f'{instance}'
43+
os.system(f'cp {pointings} {instance}/')
4744

4845
for i in range(args.cores):
49-
sub_orb = orbits[i*norbits:(i+1)*norbits]
50-
sub_phys = physical[i*norbits:(i+1)*norbits]
46+
sub_orb = orbits[orb_splits[i]]
47+
sub_phys = physical[phys_splits[i]]
5148
sub_orb.write(f"{args.path}{instance}/orbits_{i}.csv", overwrite=True)
5249
sub_phys.write(f"{args.path}{instance}/physical_{i}.csv", overwrite=True)
5350

5451
with Pool(processes=args.cores) as pool:
55-
pool.starmap(run_sorcha, [(i, args, path_inputs, pointings, instance, config, stats) for i in range(args.cores)])
52+
pool.starmap(run_sorcha, [(i, args, pointings, instance, config, stats) for i in range(args.cores)])
5653

5754
data = []
5855
for i in range(args.cores):

docs/example_files/multi_sorcha.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@
66
#SBATCH --time=24:00:00
77
#SBATCH --output=log-%a.log
88

9-
python3 multi_sorcha.py --config my_config.ini --input_orbits my_orbits.csv --input_physical my_colors.csv --pointings my_pointings.db --path ./ --chunksize $(($1 * $2)) --norbits $1 --cores $2 --instance ${SLURM_ARRAY_TASK_ID} --cleanup --copy_inputs
9+
python3 multi_sorcha.py --config my_config.ini --input_orbits my_orbits.csv --input_physical my_colors.csv --pointings my_pointings.db --path ./ --chunksize $1 --cores $2 --instance ${SLURM_ARRAY_TASK_ID} --cleanup

docs/hpc.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,13 +43,13 @@ Below is a more complex example of a Slurm script. Here, multi_sorcha.sh calls m
4343
.. note::
4444
We provide these here for you to copy, paste, and edit as needed. You might have to some slight modifications to both the Slurm script and multi_sorcha.py, for example if you're using ``Sorcha`` without calling the stats file.
4545

46-
``multi_sorcha.sh`` requests many parallel Slurm jobs of ``multi_sorcha.py``, feeding each a different --instance parameter. After changing ‘my_orbits.csv’, ‘my_colors.csv’, ‘my_pointings.db’, ‘my_config.ini’, and the various Slurm parameters to match the above, you could generate 10 jobs, each with 4 cores running 25 orbits each, as follows::
46+
``multi_sorcha.sh`` requests many parallel Slurm jobs of ``multi_sorcha.py``, feeding each a different --instance parameter. After changing ‘my_orbits.csv’, ‘my_colors.csv’, ‘my_pointings.db’, ‘my_config.ini’, and the various Slurm parameters to match the above, for a file of 1000 objects you could generate 10 jobs with 4 cores running 25 orbits each, as follows::
4747

48-
sbatch --array=0-9 multi_sorcha.sh 25 4
48+
sbatch --array=0-9 multi_sorcha.sh 100 4
4949

5050
You can run multi_sorcha.py on the command line as well::
5151

52-
python multi_sorcha.py --config sorcha_config_demo.ini --input_orbits mba_sample_1000_orbit.csv --input_physical mba_sample_1000_physical.csv --pointings baseline_v2.0_1yr.db --path ./ --chunksize 1000 --norbits 250 --cores 4 --instance 0 --stats mbastats --cleanup --copy_inputs
52+
python multi_sorcha.py --config sorcha_config_demo.ini --input_orbits mba_sample_1000_orbit.csv --input_physical mba_sample_1000_physical.csv --pointings baseline_v2.0_1yr.db --path ./ --chunksize 1000 --cores 4 --instance 0 --stats mbastats --cleanup
5353

5454
This will generate a single output file. It should work fine on a laptop, and be a bit (but not quite 4x) faster than the single-core equivalent due to overheads.
5555

0 commit comments

Comments
 (0)