-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
When SARvey is run with sarvey -f config.json 0 2 using 48 cores, it crashes in Step 2 with:
OSError: [Errno 12] Cannot allocate memory
This happens specifically at:
with multiprocessing.Pool(processes=num_cores) as pool:
This does not happen when steps are run individually (0 0, 1 1, 2 2)
System:
-
187 GB RAM, 48 cores
-
~21 GB free memory when crash occurs:
total used free shared buff/cache available
Mem: 187Gi 161Gi 21Gi 316Mi 6.2Gi 25Gi
Swap: 0B 0B 0B
Whole running:
(sarvey) /usr/bin/time -v sarvey -f config.json 0 2
Cannot load backend 'QtAgg' which requires the 'qt' interactive framework, as 'headless' is currently running
2025-06-12 22:01:36,361 - INFO - SARvey version: 1.2.0 - Strawberry Pie, 2025-02-19_01, Run: MTInSAR
. _____ _____
+------ / \ ------ / ____| /\ | __ \
| / / | (___ / \ | |__) |_ _____ _ _
| / / \___ \ / /\ \ | _ /\ \ / / _ \ | | |
| /\\ / / ____) / ____ \| | \ \ \ V / __/ |_| |
| / \\/ / |_____/_/ \_\_| \_\ \_/ \___|\__, |
| / \ / __/ |
| \ / / v1.2.0 - Strawberry Pie |___/
\\ / /... 2025-02-19_01 |
/ \\/ / :... |
/ / / :... MTInSAR |
/ / / :... |
/ / _______ :... _________|
\/ \______ :... ____________/ |
+-------------------- \________:___/ --------------------+
2025-06-12 22:01:36,366 - INFO - Parameter value default
2025-06-12 22:01:36,366 - INFO - _________ _____ _______
2025-06-12 22:01:36,366 - INFO - input_path inputs/ inputs/
2025-06-12 22:01:36,366 - INFO - output_path /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs <--- outputs/
2025-06-12 22:01:36,366 - INFO - num_cores 48 <--- 50
2025-06-12 22:01:36,366 - INFO - num_patches 1 1
2025-06-12 22:01:36,366 - INFO - apply_temporal_unwrapping True True
2025-06-12 22:01:36,366 - INFO - spatial_unwrapping_method puma puma
2025-06-12 22:01:36,366 - INFO - logging_level INFO INFO
2025-06-12 22:01:36,366 - INFO - logfile_path logfiles/ logfiles/
2025-06-12 22:01:36,366 - INFO -
2025-06-12 22:01:36,366 - INFO - ---------------------------------------------------------------------------------
2025-06-12 22:01:36,366 - INFO - STEP 0: PREPARATION
2025-06-12 22:01:36,366 - INFO - ---------------------------------------------------------------------------------
2025-06-12 22:01:36,366 - INFO - Parameter value default
2025-06-12 22:01:36,366 - INFO - _________ _____ _______
2025-06-12 22:01:36,366 - INFO - start_date None None
2025-06-12 22:01:36,366 - INFO - end_date None None
2025-06-12 22:01:36,366 - INFO - ifg_network_type sb sb
2025-06-12 22:01:36,366 - INFO - num_ifgs 3 3
2025-06-12 22:01:36,366 - INFO - max_tbase 400 <--- 100
2025-06-12 22:01:36,366 - INFO - filter_window_size 9 9
2025-06-12 22:01:36,366 - INFO -
2025-06-12 22:01:36,366 - INFO - ########## PREPARE PROCESSING: LOAD INPUT ##########
open slc file: slcStack.h5
2025-06-12 22:01:37,767 - INFO - Orbit direction: descending
2025-06-12 22:01:37,768 - INFO - Start date: 2017-10-26
2025-06-12 22:01:37,768 - INFO - Stop date: 2023-10-24
2025-06-12 22:01:37,768 - INFO - Number of SLC: 137
2025-06-12 22:01:37,768 - INFO - ########## DESIGN IFG NETWORK ##########
2025-06-12 22:01:37,777 - INFO - Small baseline network
2025-06-12 22:01:37,777 - INFO - write IfgNetwork to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/ifg_network.h5
2025-06-12 22:01:38,145 - INFO - temporal baselines: [ 11 22 33 44 55 66 77 88 99 110 121 132 143 154 165 176 187 198
209 220 231 242 253 264 275 286 297 308 319 330 341 352 363 374 385 396
407 418 429 440 451 462 473 484 495 506 517 539 550]
2025-06-12 22:01:39,602 - INFO - ########## GENERATE STACK OF 404 INTERFEROGRAMS & ESTIMATE TEMPORAL COHERENCE ##########
2025-06-12 22:01:39,607 - INFO - Prepare dataset: ifgs of <class 'numpy.complex64'> in size of (568, 669, 404)
2025-06-12 22:01:39,691 - INFO - Prepare dataset: temp_coh of <class 'numpy.float32'> in size of (568, 669)
open slc file: slcStack.h5
2025-06-12 22:02:16,026 - INFO - Patches processed: 1/1
2025-06-12 22:02:16,305 - INFO - Transform coordinates from latitude and longitude (WGS84) to North and East (UTM).
2025-06-12 22:02:16,647 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/coordinates_utm.h5...
2025-06-12 22:02:17,527 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/background_map.h5...
2025-06-12 22:02:18,972 - INFO - reading box None from file: /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/temporal_coherence.h5 ...
descending orbit -> flip left-right
2025-06-12 22:02:20,350 - INFO - ---------------------------------------------------------------------------------
2025-06-12 22:02:20,350 - INFO - STEP 1: CONSISTENCY CHECK
2025-06-12 22:02:20,350 - INFO - ---------------------------------------------------------------------------------
2025-06-12 22:02:20,350 - INFO - Parameter value default
2025-06-12 22:02:20,350 - INFO - _________ _____ _______
2025-06-12 22:02:20,351 - INFO - coherence_p1 0.8 <--- 0.9
2025-06-12 22:02:20,351 - INFO - grid_size None <--- 200
2025-06-12 22:02:20,351 - INFO - mask_p1_file None <---
2025-06-12 22:02:20,351 - INFO - num_nearest_neighbours 50 <--- 30
2025-06-12 22:02:20,351 - INFO - max_arc_length 999999 <--- None
2025-06-12 22:02:20,351 - INFO - velocity_bound 0.1 0.1
2025-06-12 22:02:20,351 - INFO - dem_error_bound 250.0 <--- 100.0
2025-06-12 22:02:20,351 - INFO - num_optimization_samples 100 100
2025-06-12 22:02:20,351 - INFO - arc_unwrapping_coherence 0.5 <--- 0.6
2025-06-12 22:02:20,351 - INFO - min_num_arc 3 3
2025-06-12 22:02:20,351 - INFO -
2025-06-12 22:02:20,499 - INFO - reading box None from file: /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/temporal_coherence.h5 ...
2025-06-12 22:02:22,430 - INFO - No mask for area of interest given.
2025-06-12 22:02:25,458 - INFO - reading box None from file: /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/ifg_stack.h5 ...
2025-06-12 22:02:29,084 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/p1_ifg_wr.h5...
2025-06-12 22:02:29,139 - INFO - create distance matrix between all points...
2025-06-12 22:02:36,913 - INFO - Triangulate points with 50-nearest neighbours.
[==================================================] 13344/13344 points triangulated 2s / 0s
2025-06-12 22:02:39,097 - INFO - Triangulate points with global delaunay.
2025-06-12 22:02:39,564 - INFO - remove arcs with length > 999999.
2025-06-12 22:02:40,320 - INFO - retrieve arcs from adjacency matrix.
2025-06-12 22:02:41,324 - INFO - no. arcs: 392246
2025-06-12 22:03:07,520 - INFO - ifg arc observations created.
2025-06-12 22:03:07,520 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/point_network.h5...
2025-06-12 22:03:10,762 - INFO - ########## TEMPORAL UNWRAPPING: AMBIGUITY FUNCTION ##########
2025-06-12 22:03:10,762 - INFO - start parallel processing with 48 cores.
2025-06-12 22:04:55,730 - INFO - Finished temporal unwrapping.
2025-06-12 22:04:55,740 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/point_network_parameter.h5...
2025-06-12 22:05:47,313 - INFO - Detect points with low quality arcs (mean): < 0.5
2025-06-12 22:05:47,313 - INFO - Removal of points whose arcs are incoherent in average.
2025-06-12 22:06:45,876 - INFO - Detected 58 point(s) with mean coherence of all connected arcs < 0.5
2025-06-12 22:06:45,877 - INFO - Removal of low quality arcs: < 0.5
2025-06-12 22:06:45,877 - INFO - Removed 5265 arc(s)
2025-06-12 22:06:46,277 - INFO - Removal of arcs and PSC that cannot be tested.
2025-06-12 22:07:29,298 - INFO - Detected 5 point(s) with less than 3 arcs
2025-06-12 22:07:29,298 - INFO - Remove 58 point(s)
2025-06-12 22:08:32,912 - INFO - Removed 657 arc(s) connected to the removed point(s)
2025-06-12 22:09:18,067 - INFO - ########## NOISY POINT REMOVAL BASED ON ARC PARAMETERS ##########
2025-06-12 22:09:18,067 - INFO - Selection of the reference PSC
2025-06-12 22:10:14,981 - INFO - Spatial integration to detect noisy point
2025-06-12 22:10:14,981 - INFO - ITERATION: 0
2025-06-12 22:31:59,929 - INFO - Maximum RMSE DEM correction: 12.86 m
2025-06-12 22:31:59,930 - INFO - Maximum RMSE velocity: 0.0016 m / year
2025-06-12 22:32:03,434 - INFO - No noisy pixels detected.
2025-06-12 22:32:03,435 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/point_network_parameter.h5...
2025-06-12 22:32:05,887 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/p1_ifg_wr.h5...
2025-06-12 22:32:05,934 - INFO - ---------------------------------------------------------------------------------
2025-06-12 22:32:05,935 - INFO - STEP 2: UNWRAPPING
2025-06-12 22:32:05,935 - INFO - ---------------------------------------------------------------------------------
2025-06-12 22:32:05,935 - INFO - Parameter value default
2025-06-12 22:32:05,935 - INFO - _________ _____ _______
2025-06-12 22:32:05,935 - INFO - use_arcs_from_temporal_unwrapping True True
2025-06-12 22:32:05,935 - INFO -
2025-06-12 22:32:07,640 - INFO - read from /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/p1_ifg_wr.h5
2025-06-12 22:32:08,225 - INFO - Integrate parameters from arcs to points.
2025-06-12 22:32:08,225 - INFO - Integrate DEM correction.
2025-06-12 22:34:10,946 - INFO - Integrate mean velocity.
2025-06-12 22:35:44,046 - INFO - Remove phase contributions from mean velocity and DEM correction from wrapped phase of points.
2025-06-12 22:35:44,600 - INFO - ########## SPATIAL UNWRAPPING: puma ##########
2025-06-12 22:35:44,600 - INFO - start parallel processing with 48 cores.
Traceback (most recent call last):
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/bin/sarvey", line 8, in <module>
sys.exit(main())
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/sarvey/sarvey/sarvey_mti.py", line 311, in main
run(config=config, args=args, logger=logger)
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/sarvey/sarvey/sarvey_mti.py", line 134, in run
proc_obj.runUnwrappingTimeAndSpace()
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/sarvey/sarvey/processing.py", line 447, in runUnwrappingTimeAndSpace
unw_res_phase = spatialUnwrapping(num_ifgs=point_obj.ifg_net_obj.num_ifgs,
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/sarvey/sarvey/unwrapping.py", line 493, in spatialUnwrapping
with multiprocessing.Pool(processes=num_cores) as pool:
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/pool.py", line 215, in __init__
self._repopulate_pool()
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/pool.py", line 306, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/pool.py", line 329, in _repopulate_pool_static
w.start()
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
return Popen(process_obj)
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
Command exited with non-zero status 1
Command being timed: "sarvey -f config.json 0 2"
User time (seconds): 6465.86
System time (seconds): 241.89
Percent of CPU this job got: 319%
Elapsed (wall clock) time (h:mm:ss or m:ss): 34:57.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 166424064
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 14916
Minor (reclaiming a frame) page faults: 5385893
Voluntary context switches: 329510
Involuntary context switches: 11844
Swaps: 0
File system inputs: 553872
File system outputs: 10104968
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels