Skip to content

RFT分布式训练出错 #83

@vitanie

Description

@vitanie

Image

rft阶段,按照作者的github提示,将NODES=("10.112.2.106" "10.112.2.40") 与
pdsh -R ssh -w "$NODE" bash -lc "
source ~/anaconda3/bin/activate arl
cd '$FILE_DIR'
export TOKENIZERS_PARALLELISM=false 虚拟环境设置后,运行发现在加载模型与优化器分布训练的时候出现卡顿,两台机器上均配置了相同的arl虚拟环境和代码,

bash fsdp.sh
Training directory: /home/xiang.xiao/AgentCPM-GUI-main/rft
Launching on nodes: 10.112.2.106 10.112.2.40
-> Launching on 10.112.2.106 (rank 0)...
-> Launching on 10.112.2.40 (rank 1)...
10.112.2.40: Authorized users only. All activity may be monitored and reported.
10.112.2.40: bash: -c: option requires an argument
10.112.2.106: bash: -c: option requires an argument
10.112.2.106: INFO 08-06 09:38:11 [init.py:239] Automatically detected platform cuda.
10.112.2.106: reward_funcs: [<function action_type_check at 0x7f2999860cc0>, <function action_args_check at 0x7f2999860e00>]
10.112.2.106: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
10.112.2.106: You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
Loading checkpoint shards: 100%|██████████| 4/4 [00:00<00:00, 29.59it/s]
10.112.2.106: /home/xiang.xiao/anaconda3/envs/arl/lib/python3.11/site-packages/transformers/models/auto/image_processing_auto.py:625: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use slow_image_processor_class, or fast_image_processor_class instead
10.112.2.106: warnings.warn(
10.112.2.106: Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
Loading dataset: 100%|██████████| 15/15 [00:00<00:00, 106274.59it/s]
Loading dataset: 100%|██████████| 15/15 [00:00<00:00, 154581.23it/s]
10.112.2.106: [2025-08-06 09:38:13,519] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
10.112.2.40: INFO 08-06 09:38:17 [init.py:239] Automatically detected platform cuda.
10.112.2.40: reward_funcs: [<function action_type_check at 0x7f7a39938cc0>, <function action_args_check at 0x7f7a39938e00>]
10.112.2.40: You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
10.112.2.40: You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
Loading checkpoint shards: 100%|██████████| 4/4 [00:00<00:00, 16.14it/s]
10.112.2.40: /home/xiang.xiao/anaconda3/envs/arl/lib/python3.11/site-packages/transformers/models/auto/image_processing_auto.py:625: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use slow_image_processor_class, or fast_image_processor_class instead
10.112.2.40: warnings.warn(
10.112.2.40: Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
10.112.2.106: self.global_sync_address is tcp://10.112.2.106:15000
10.112.2.106: collect_address is tcp://10.112.2.106:15001
10.112.2.106: num_generation is 4
10.112.2.106: num_to_sync is 2 num_to_sync is 2 num_to_sync is 2 num_to_sync is 2
10.112.2.106: gradient_accumulation_steps is 2
10.112.2.106: accelerator.num_processes is 2
10.112.2.106: args.per_device_train_batch_size is 2
10.112.2.106: accelerator num_processes 2 // device_count 1
10.112.2.106: tp_size is %d 1
10.112.2.106: we run into _sync_node_queue
10.112.2.106: _sync_node_queue--------------
10.112.2.106: [rank0]:[W806 09:38:20.084839864 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
10.112.2.106: NCCL version 2.21.5+cuda12.4
Loading dataset: 100%|██████████| 15/15 [00:00<00:00, 72149.72it/s]
Loading dataset: 100%|██████████| 15/15 [00:00<00:00, 85831.60it/s]
10.112.2.40: [2025-08-06 09:38:21,003] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.40: [rank1]:[W806 09:38:32.905694726 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
10.112.2.106: 2025-08-06 09:38:33,208 - ARL - INFO - Listen for stealing at tcp://10.112.2.106:15003
10.112.2.106: max_cache_size is 32
10.112.2.106: 2025-08-06 09:38:33,289 - ARL - INFO - Worker 0 is running on 0 and setup 0MQ.
10.112.2.40: 2025-08-06 09:38:33,290 - ARL - INFO - Worker 1 is running on 1 and setup 0MQ.
10.112.2.40: 2025-08-06 09:38:33,292 - ARL - INFO - Listen for stealing at tcp://10.112.2.40:15003
10.112.2.106: we are returning the GlobalDistributed0MQDataLoader
10.112.2.106: _load_data require data
10.112.2.106: _load_data require data
10.112.2.106: _load_data require data
10.112.2.106: _load_data require data
10.112.2.106: self.chunk_size is 2
10.112.2.106: we run into work_stealing
10.112.2.106: sync_handler----------
10.112.2.106: --------------------------------------------------------11111
10.112.2.106: _master_loop------------------
10.112.2.106: --------------------------------------------------------222
10.112.2.106: --------------------------------------------------------3333
10.112.2.106: --------------------------------------------------------4444444
10.112.2.106: --------------------------------------------------------555555
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04}\x94.']
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: 2025-08-06 09:38:50,636 - ARL - INFO - [ Global GID: 0 | SyncPool Size: 0 | 0 acked / 0 total ] Current 0 sent, 0 ack. Speed 0.00/s.
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: 2025-08-06 09:39:20,637 - ARL - INFO - [ Global GID: 0 | SyncPool Size: 0 | 0 acked / 0 total ] Current 0 sent, 0 ack. Speed 0.00/s.
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: 2025-08-06 09:40:20,638 - ARL - INFO - [ Global GID: 0 | SyncPool Size: 0 | 0 acked / 0 total ] Current 0 sent, 0 ack. Speed 0.00/s.
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: 2025-08-06 09:40:50,638 - ARL - INFO - [ Global GID: 0 | SyncPool Size: 0 | 0 acked / 0 total ] Current 0 sent, 0 ack. Speed 0.00/s.
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: 2025-08-06 09:41:20,638 - ARL - INFO - [ Global GID: 0 | SyncPool Size: 0 | 0 acked / 0 total ] Current 0 sent, 0 ack. Speed 0.00/s.
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: 2025-08-06 09:45:50,642 - ARL - INFO - [ Global GID: 0 | SyncPool Size: 0 | 0 acked / 0 total ] Current 0 sent, 0 ack. Speed 0.00/s.
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: 2025-08-06 09:46:20,642 - ARL - INFO - [ Global GID: 0 | SyncPool Size: 0 | 0 acked / 0 total ] Current 0 sent, 0 ack. Speed 0.00/s.
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: 2025-08-06 09:46:50,642 - ARL - INFO - [ Global GID: 0 | SyncPool Size: 0 | 0 acked / 0 total ] Current 0 sent, 0 ack. Speed 0.00/s.
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: 2025-08-06 09:47:20,643 - ARL - INFO - [ Global GID: 0 | SyncPool Size: 0 | 0 acked / 0 total ] Current 0 sent, 0 ack. Speed 0.00/s.
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: 2025-08-06 09:47:50,643 - ARL - INFO - [ Global GID: 0 | SyncPool Size: 0 | 0 acked / 0 total ] Current 0 sent, 0 ack. Speed 0.00/s.
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: message received is {'tcp://10.112.2.40:15003': 0}
10.112.2.106: sync_sender.send_multipart
10.112.2.106: _sync_node_queue--------------
10.112.2.106: work_stealing parts are [b'SYNC_NODE_QUEUE_LENGTHS', b'\x80\x04\x95>\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x18tcp://10.112.2.106:15003\x94K\x00\x8c\x17tcp://10.112.2.40:15003\x94K\x00u.']
10.112.2.106: mean_queue_length is 0.0
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
10.112.2.106: we run into work_stealing
10.112.2.106: message received is {'tcp://10.112.2.106:15003': 0}
^Cpdsh@a800-01: interrupt (one more within 1 sec to abort)
pdsh@a800-01: interrupt (one more within 1 sec to abort)
pdsh@a800-01: (^Z within 1 sec to cancel pending threads)
pdsh@a800-01: (^Z within 1 sec to cancel pending threads)
pdsh@a800-01: 10.112.2.40: command in progresspdsh@a800-01: 10.112.2.106: command in progress

Cleanup: killing local pdsh and remote training processes...
10.112.2.40: Authorized users only. All activity may be monitored and reported.
pdsh@a800-01: 10.112.2.40: ssh exited with exit code 255
^Cpdsh@a800-01: interrupt (one more within 1 sec to abort)
pdsh@a800-01: (^Z within 1 sec to cancel pending threads)
pdsh@a800-01: 10.112.2.106: command in progress

Cleanup: killing local pdsh and remote training processes...
10.112.2.40: Authorized users only. All activity may be monitored and reported.
pdsh@a800-01: 10.112.2.40: ssh exited with exit code 255

Cleanup: killing local pdsh and remote training processes...
10.112.2.40: Authorized users only. All activity may be monitored and reported.
pdsh@a800-01: 10.112.2.40: ssh exited with exit code 255
^Cpdsh@a800-01: interrupt (one more within 1 sec to abort)
pdsh@a800-01: (^Z within 1 sec to cancel pending threads)
pdsh@a800-01: 10.112.2.106: command in progress

Cleanup: killing local pdsh and remote training processes...
10.112.2.40: Authorized users only. All activity may be monitored and reported.
pdsh@a800-01: 10.112.2.40: ssh exited with exit code 255

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions