Skip to content

Step 2 crashes after Step 1 with 48 cores due to memory exhaustion (OSError: [Errno 12] Cannot allocate memory) #100

@epehlivanli

Description

@epehlivanli

When SARvey is run with sarvey -f config.json 0 2 using 48 cores, it crashes in Step 2 with:

OSError: [Errno 12] Cannot allocate memory

This happens specifically at:

with multiprocessing.Pool(processes=num_cores) as pool:

This does not happen when steps are run individually (0 0, 1 1, 2 2)

System:

  • 187 GB RAM, 48 cores

  • ~21 GB free memory when crash occurs:

               total        used        free	  shared  buff/cache   available
Mem:           187Gi	   161Gi        21Gi	   316Mi       6.2Gi	    25Gi
Swap:             0B          0B          0B

Whole running:

(sarvey) /usr/bin/time -v sarvey -f config.json 0 2
Cannot load backend 'QtAgg' which requires the 'qt' interactive framework, as 'headless' is currently running
2025-06-12 22:01:36,361 - INFO - SARvey version: 1.2.0 - Strawberry Pie, 2025-02-19_01, Run: MTInSAR
                                .            _____         _____
                      +------  / \  ------  / ____|  /\   |  __ \
                      |       /  /         | (___   /  \  | |__) |_   _____ _   _
                      |      /  /           \___ \ / /\ \ |  _  /\ \ / / _ \ | | |
                      |   /\\  /  /         ____) / ____ \| | \ \ \ V /  __/ |_| |
                      |  /  \\/  /         |_____/_/    \_\_|  \_\ \_/ \___|\__, |
                      | /    \  /                                            __/ |
                      | \    / /               v1.2.0 - Strawberry Pie      |___/
                        \\  / /...             2025-02-19_01                     |
                       / \\/ /    :...                                           |
                      /  /  /         :...     MTInSAR                           |
                     /  /  /              :...                                   |
                    /  /       _______        :...                      _________|
                     \/               \______     :...     ____________/         |
                      +--------------------  \________:___/  --------------------+
    
2025-06-12 22:01:36,366 - INFO -                               Parameter           value         default
2025-06-12 22:01:36,366 - INFO -                               _________           _____         _______
2025-06-12 22:01:36,366 - INFO -                              input_path         inputs/         inputs/
2025-06-12 22:01:36,366 - INFO -                             output_path /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs <---   outputs/
2025-06-12 22:01:36,366 - INFO -                               num_cores              48 <---         50
2025-06-12 22:01:36,366 - INFO -                             num_patches               1               1
2025-06-12 22:01:36,366 - INFO -               apply_temporal_unwrapping            True            True
2025-06-12 22:01:36,366 - INFO -               spatial_unwrapping_method            puma            puma
2025-06-12 22:01:36,366 - INFO -                           logging_level            INFO            INFO
2025-06-12 22:01:36,366 - INFO -                            logfile_path       logfiles/       logfiles/
2025-06-12 22:01:36,366 - INFO - 
2025-06-12 22:01:36,366 - INFO -     ---------------------------------------------------------------------------------
2025-06-12 22:01:36,366 - INFO -                          STEP 0:     PREPARATION
2025-06-12 22:01:36,366 - INFO -     ---------------------------------------------------------------------------------
2025-06-12 22:01:36,366 - INFO -                               Parameter           value         default
2025-06-12 22:01:36,366 - INFO -                               _________           _____         _______
2025-06-12 22:01:36,366 - INFO -                              start_date            None            None
2025-06-12 22:01:36,366 - INFO -                                end_date            None            None
2025-06-12 22:01:36,366 - INFO -                        ifg_network_type              sb              sb
2025-06-12 22:01:36,366 - INFO -                                num_ifgs               3               3
2025-06-12 22:01:36,366 - INFO -                               max_tbase             400 <---        100
2025-06-12 22:01:36,366 - INFO -                      filter_window_size               9               9
2025-06-12 22:01:36,366 - INFO - 
2025-06-12 22:01:36,366 - INFO - ########## PREPARE PROCESSING: LOAD INPUT ##########
open slc file: slcStack.h5
2025-06-12 22:01:37,767 - INFO - Orbit direction: descending
2025-06-12 22:01:37,768 - INFO - Start date: 2017-10-26
2025-06-12 22:01:37,768 - INFO - Stop date: 2023-10-24
2025-06-12 22:01:37,768 - INFO - Number of SLC: 137
2025-06-12 22:01:37,768 - INFO - ########## DESIGN IFG NETWORK ##########
2025-06-12 22:01:37,777 - INFO - Small baseline network
2025-06-12 22:01:37,777 - INFO - write IfgNetwork to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/ifg_network.h5
2025-06-12 22:01:38,145 - INFO - temporal baselines: [ 11  22  33  44  55  66  77  88  99 110 121 132 143 154 165 176 187 198
 209 220 231 242 253 264 275 286 297 308 319 330 341 352 363 374 385 396
 407 418 429 440 451 462 473 484 495 506 517 539 550]
2025-06-12 22:01:39,602 - INFO - ########## GENERATE STACK OF 404 INTERFEROGRAMS & ESTIMATE TEMPORAL COHERENCE ##########
2025-06-12 22:01:39,607 - INFO - Prepare dataset: ifgs                      of <class 'numpy.complex64'> in size of (568, 669, 404)
2025-06-12 22:01:39,691 - INFO - Prepare dataset: temp_coh                  of <class 'numpy.float32'>   in size of (568, 669)
open slc file: slcStack.h5
2025-06-12 22:02:16,026 - INFO - Patches processed:	 1/1
2025-06-12 22:02:16,305 - INFO - Transform coordinates from latitude and longitude (WGS84) to North and East (UTM).
2025-06-12 22:02:16,647 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/coordinates_utm.h5...
2025-06-12 22:02:17,527 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/background_map.h5...
2025-06-12 22:02:18,972 - INFO - reading box None from file: /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/temporal_coherence.h5 ...
descending orbit -> flip left-right
2025-06-12 22:02:20,350 - INFO -     ---------------------------------------------------------------------------------
2025-06-12 22:02:20,350 - INFO -                          STEP 1:     CONSISTENCY CHECK
2025-06-12 22:02:20,350 - INFO -     ---------------------------------------------------------------------------------
2025-06-12 22:02:20,350 - INFO -                               Parameter           value         default
2025-06-12 22:02:20,350 - INFO -                               _________           _____         _______
2025-06-12 22:02:20,351 - INFO -                            coherence_p1             0.8 <---        0.9
2025-06-12 22:02:20,351 - INFO -                               grid_size            None <---        200
2025-06-12 22:02:20,351 - INFO -                            mask_p1_file            None <---           
2025-06-12 22:02:20,351 - INFO -                  num_nearest_neighbours              50 <---         30
2025-06-12 22:02:20,351 - INFO -                          max_arc_length          999999 <---       None
2025-06-12 22:02:20,351 - INFO -                          velocity_bound             0.1             0.1
2025-06-12 22:02:20,351 - INFO -                         dem_error_bound           250.0 <---      100.0
2025-06-12 22:02:20,351 - INFO -                num_optimization_samples             100             100
2025-06-12 22:02:20,351 - INFO -                arc_unwrapping_coherence             0.5 <---        0.6
2025-06-12 22:02:20,351 - INFO -                             min_num_arc               3               3
2025-06-12 22:02:20,351 - INFO - 
2025-06-12 22:02:20,499 - INFO - reading box None from file: /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/temporal_coherence.h5 ...
2025-06-12 22:02:22,430 - INFO - No mask for area of interest given.
2025-06-12 22:02:25,458 - INFO - reading box None from file: /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/ifg_stack.h5 ...
2025-06-12 22:02:29,084 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/p1_ifg_wr.h5...
2025-06-12 22:02:29,139 - INFO - create distance matrix between all points...
2025-06-12 22:02:36,913 - INFO - Triangulate points with 50-nearest neighbours.
[==================================================] 13344/13344 points triangulated    2s /     0s
2025-06-12 22:02:39,097 - INFO - Triangulate points with global delaunay.
2025-06-12 22:02:39,564 - INFO - remove arcs with length > 999999.
2025-06-12 22:02:40,320 - INFO - retrieve arcs from adjacency matrix.
2025-06-12 22:02:41,324 - INFO - no. arcs:	392246
2025-06-12 22:03:07,520 - INFO - ifg arc observations created.
2025-06-12 22:03:07,520 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/point_network.h5...
2025-06-12 22:03:10,762 - INFO - ########## TEMPORAL UNWRAPPING: AMBIGUITY FUNCTION ##########
2025-06-12 22:03:10,762 - INFO - start parallel processing with 48 cores.
2025-06-12 22:04:55,730 - INFO - Finished temporal unwrapping.
2025-06-12 22:04:55,740 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/point_network_parameter.h5...
2025-06-12 22:05:47,313 - INFO - Detect points with low quality arcs (mean): < 0.5
2025-06-12 22:05:47,313 - INFO - Removal of points whose arcs are incoherent in average.
2025-06-12 22:06:45,876 - INFO - Detected 58 point(s) with mean coherence of all connected arcs < 0.5 
2025-06-12 22:06:45,877 - INFO - Removal of low quality arcs: < 0.5
2025-06-12 22:06:45,877 - INFO - Removed 5265 arc(s)
2025-06-12 22:06:46,277 - INFO - Removal of arcs and PSC that cannot be tested.
2025-06-12 22:07:29,298 - INFO - Detected 5 point(s) with less than 3 arcs
2025-06-12 22:07:29,298 - INFO - Remove 58 point(s)
2025-06-12 22:08:32,912 - INFO - Removed 657 arc(s) connected to the removed point(s)
2025-06-12 22:09:18,067 - INFO - ########## NOISY POINT REMOVAL BASED ON ARC PARAMETERS ##########
2025-06-12 22:09:18,067 - INFO - Selection of the reference PSC
2025-06-12 22:10:14,981 - INFO - Spatial integration to detect noisy point
2025-06-12 22:10:14,981 - INFO - ITERATION: 0
2025-06-12 22:31:59,929 - INFO - Maximum RMSE DEM correction: 12.86 m
2025-06-12 22:31:59,930 - INFO - Maximum RMSE velocity: 0.0016 m / year
2025-06-12 22:32:03,434 - INFO - No noisy pixels detected.
2025-06-12 22:32:03,435 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/point_network_parameter.h5...
2025-06-12 22:32:05,887 - INFO - write data to /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/p1_ifg_wr.h5...
2025-06-12 22:32:05,934 - INFO -     ---------------------------------------------------------------------------------
2025-06-12 22:32:05,935 - INFO -                          STEP 2:     UNWRAPPING
2025-06-12 22:32:05,935 - INFO -     ---------------------------------------------------------------------------------
2025-06-12 22:32:05,935 - INFO -                               Parameter           value         default
2025-06-12 22:32:05,935 - INFO -                               _________           _____         _______
2025-06-12 22:32:05,935 - INFO -       use_arcs_from_temporal_unwrapping            True            True
2025-06-12 22:32:05,935 - INFO - 
2025-06-12 22:32:07,640 - INFO - read from /scratch/08590/emrhnp/MiamiTsxSMDT36/Grove_subset/Grove_newcon/outputs/p1_ifg_wr.h5
2025-06-12 22:32:08,225 - INFO - Integrate parameters from arcs to points.
2025-06-12 22:32:08,225 - INFO - Integrate DEM correction.
2025-06-12 22:34:10,946 - INFO - Integrate mean velocity.
2025-06-12 22:35:44,046 - INFO - Remove phase contributions from mean velocity and DEM correction from wrapped phase of points.
2025-06-12 22:35:44,600 - INFO - ########## SPATIAL UNWRAPPING: puma ##########
2025-06-12 22:35:44,600 - INFO - start parallel processing with 48 cores.
Traceback (most recent call last):
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/bin/sarvey", line 8, in <module>
    sys.exit(main())
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/sarvey/sarvey/sarvey_mti.py", line 311, in main
    run(config=config, args=args, logger=logger)
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/sarvey/sarvey/sarvey_mti.py", line 134, in run
    proc_obj.runUnwrappingTimeAndSpace()
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/sarvey/sarvey/processing.py", line 447, in runUnwrappingTimeAndSpace
    unw_res_phase = spatialUnwrapping(num_ifgs=point_obj.ifg_net_obj.num_ifgs,
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/sarvey/sarvey/unwrapping.py", line 493, in spatialUnwrapping
    with multiprocessing.Pool(processes=num_cores) as pool:
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/pool.py", line 215, in __init__
    self._repopulate_pool()
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/pool.py", line 306, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/pool.py", line 329, in _repopulate_pool_static
    w.start()
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
    return Popen(process_obj)
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/work2/08590/emrhnp/stampede3/code/rsmas_insar/tools/miniforge3/envs/sarvey/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
Command exited with non-zero status 1
	Command being timed: "sarvey -f config.json 0 2"
	User time (seconds): 6465.86
	System time (seconds): 241.89
	Percent of CPU this job got: 319%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 34:57.00
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 166424064
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 14916
	Minor (reclaiming a frame) page faults: 5385893
	Voluntary context switches: 329510
	Involuntary context switches: 11844
	Swaps: 0
	File system inputs: 553872
	File system outputs: 10104968
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions