Skip to content

Demo Cloth Funnels #10

@rzg258

Description

@rzg258

when i run
python cloth_funnels/run_sim.py
name="demo-single"
load=../models/longsleeve_canonicalized_alignment.pth
eval_tasks=../assets/tasks/longsleeve-single.hdf5
eval=True
num_processes=1
episode_length=10
wandb=disabled
fold_finish=True
dump_visualizations=True
I get the following error:2025-05-25 11:21:07,270 INFO worker.py:1544 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
SEEDING WITH 0
[Network] Initializing with inputs: rgb_pos
[Network] Initializing factorized network
[Network] Giving deformable network positional encoding
[Network Setup] Load checkpoint specified ../models/longsleeve_canonicalized_alignment.pth
[Network Setup] Action Exploration Probability: 1.0141e-03
[Network Setup] Value Exploration Probability: 1.0284e-06
[Network Setup] Train Steps: 6216
Replay Buffer path: ../experiments/05-25-1121-demo-single/replay_buffer.hdf5
CUDA_VISIBLE_DEVICES: 0
(TaskLoader pid=1111701) [TaskLoader] Loading eval tasks
(SimEnv pid=1111702) *** SIGSEGV received at time=1748143273 on cpu 3 ***
(SimEnv pid=1111702) @ 0x7adb00242520 (unknown) (unknown)
(SimEnv pid=1111702) [2025-05-25 11:21:13,439 E 1111702 1111702] logging.cc:361: *** SIGSEGV received at time=1748143273 on cpu 3 ***
(SimEnv pid=1111702) [2025-05-25 11:21:13,439 E 1111702 1111702] logging.cc:361: @ 0x7adb00242520 (unknown) (unknown)
(SimEnv pid=1111702) Fatal Python error: Segmentation fault
(SimEnv pid=1111702)
(SimEnv pid=1111702) Stack (most recent call first):
(SimEnv pid=1111702) File "/home/rzg/cloth-funnels/cloth_funnels/environment/simEnv.py", line 262 in setup_env
(SimEnv pid=1111702) File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/ray/util/tracing/tracing_helper.py", line 466 in _resume_span
(SimEnv pid=1111702) File "/home/rzg/cloth-funnels/cloth_funnels/environment/simEnv.py", line 212 in init
(SimEnv pid=1111702) File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/ray/util/tracing/tracing_helper.py", line 466 in _resume_span
(SimEnv pid=1111702) File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/ray/_private/function_manager.py", line 674 in actor_method_executor
(SimEnv pid=1111702) File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/ray/_private/worker.py", line 772 in main_loop
(SimEnv pid=1111702) File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/ray/_private/workers/default_worker.py", line 226 in
2025-05-25 11:21:20,801 WARNING worker.py:1866 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff3eb819befb555444286775b101000000 Worker ID: edc33d05be7f69141d3b6b5681b0a7c8dd7c663a0dfbe30c4f72522e Node ID: a6bd8d6743aea3f8077148bc360c84a1aca73f0cb505c98b937e6076 Worker IP address: 10.202.9.99 Worker port: 38529 Worker PID: 1111702 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
Error executing job with overrides: ['name=demo-single', 'load=../models/longsleeve_canonicalized_alignment.pth', 'eval_tasks=../assets/tasks/longsleeve-single.hdf5', 'eval=True', 'num_processes=1', 'episode_length=5', 'wandb=disabled', 'fold_finish=True', 'dump_visualizations=True']
Traceback (most recent call last):
File "/home/rzg/cloth-funnels/cloth_funnels/run_sim.py", line 263, in
main()
File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/hydra/main.py", line 90, in decorated_main
_run_hydra(
File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/hydra/_internal/utils.py", line 389, in _run_hydra
_run_app(
File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/hydra/_internal/utils.py", line 452, in _run_app
run_and_report(
File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/hydra/_internal/utils.py", line 216, in run_and_report
raise ex
File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/hydra/_internal/utils.py", line 213, in run_and_report
return func()
File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/hydra/_internal/utils.py", line 453, in
lambda: hydra.run(
File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/home/rzg/cloth-funnels/cloth_funnels/run_sim.py", line 114, in main
envs, _ = setup_envs(dataset=dataset_path, **args)
File "/home/rzg/cloth-funnels/cloth_funnels/utils/utils.py", line 145, in setup_envs
ray.get([e.setup_ray.remote(e) for e in envs])
File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/rzg/anaconda3/envs/cloth-funnels/lib/python3.9/site-packages/ray/_private/worker.py", line 2382, in get
raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
class_name: SimEnv
actor_id: 3eb819befb555444286775b101000000
pid: 1111702
namespace: 7ae3f5db-e73d-4521-8a05-5b0dbdc2a5ee
ip: 10.202.9.99
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
The actor never ran - it was cancelled before it started running.
Any idea how to fix this?
Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions