Skip to content

Conversation

allenwang28
Copy link
Contributor

@allenwang28 allenwang28 commented Oct 15, 2025

Reverting commit 1f45470 until issues/tests are resolved

Example error we saw before:

thread '<unnamed>' panicked at fbcode/monarch/hyperactor/src/reference.rs:652:14:
world_name() called on direct proc
[-]E1015 15:33:56.344937  2041 fbcode/monarch/hyperactor/src/panic_handler.rs:47] panic at fbcode/monarch/hyperactor/src/reference.rs:657:23, stacktrace:   0: std::panicking::update_hook::<hyperactor::panic_handler::set_panic_hook::{closure#0}>::{closure#0}
             at ./fbcode/monarch/hyperactor/src/panic_handler.rs:31:25
   1: std::panicking::rust_panic_with_hook
             at ./xplat/rust/toolchain/sysroot/1.90.0/library/alloc/src/boxed.rs:1985:9
   2: std::panicking::begin_panic_handler::{closure#0}
             at ./xplat/rust/toolchain/sysroot/1.90.0/library/std/src/panicking.rs:706:13
   3: std::sys::backtrace::__rust_end_short_backtrace::<std::panicking::begin_panic_handler::{closure#0}, !>
             at ./xplat/rust/toolchain/sysroot/1.90.0/library/std/src/sys/backtrace.rs:174:18
   4: __rustc::rust_begin_unwind
             at ./xplat/rust/toolchain/sysroot/1.90.0/library/std/src/panicking.rs:697:5
   5: core::panicking::panic_fmt
             at ./xplat/rust/toolchain/sysroot/1.90.0/library/core/src/panicking.rs:75:14
   6: core::option::expect_failed
             at ./xplat/rust/toolchain/sysroot/1.90.0/library/core/src/panicking.rs:268:5
   7: <hyperactor::reference::ActorId>::rank
             at ./xplat/rust/toolchain/sysroot/1.90.0/library/core/src/option.rs:964:21
   8: <monarch_hyperactor::proc::PyActorId>::__pymethod_get_rank__
             at ./fbcode/monarch/monarch_hyperactor/src/proc.rs:320:20
   9: <pyo3::pyclass::create_type_object::GetSetDefType>::create_py_get_set_def::getter
             at ./third-party/rust/vendor/pyo3-0.24.2/src/pyclass/create_type_object.rs:653:50
  10: _PyObject_GenericGetAttrWithDict
             at /usr/local/src/conda/python-3.10.18/Objects/object.c:1254:19
  11: PyObject_GenericGetAttr
             at /usr/local/src/conda/python-3.10.18/Objects/object.c:1335:12
  12: PyObject_GetAttr
             at /usr/local/src/conda/python-3.10.18/Objects/object.c:932:19
  13: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:3592:19
  14: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.18/Include/internal/pycore_ceval.h:46:12
  15: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5067:24
  16: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Objects/call.c:342:16
  17: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:114:11
  18: PyObject_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:123:12
  19: call_function
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5893:13
  20: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:4231:19
  21: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.18/Include/internal/pycore_ceval.h:46:12
  22: gen_send_ex2
             at /usr/local/src/conda/python-3.10.18/Objects/genobject.c:213:14
  23: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:2586:30
  24: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.18/Include/internal/pycore_ceval.h:46:12
  25: gen_send_ex2
             at /usr/local/src/conda/python-3.10.18/Objects/genobject.c:213:14
  26: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:2586:30
  27: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.18/Include/internal/pycore_ceval.h:46:12
  28: gen_send_ex2
             at /usr/local/src/conda/python-3.10.18/Objects/genobject.c:213:14
  29: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:2586:30
  30: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.18/Include/internal/pycore_ceval.h:46:12
  31: gen_send_ex2
             at /usr/local/src/conda/python-3.10.18/Objects/genobject.c:213:14
  32: task_step_impl
             at /usr/local/src/conda/python-3.10.18/Modules/_asynciomodule.c:2653:22
  33: task_step
             at /usr/local/src/conda/python-3.10.18/Modules/_asynciomodule.c:2950:11
  34: _PyObject_MakeTpCall
             at /usr/local/src/conda/python-3.10.18/Objects/call.c:215:18
  35: context_run
             at /usr/local/src/conda/python-3.10.18/Python/context.c:665:0
  36: cfunction_vectorcall_FASTCALL_KEYWORDS
             at /usr/local/src/conda/python-3.10.18/Objects/methodobject.c:446:24
  37: do_call_core
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5917:9
  38: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:4277:22
  39: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.18/Include/internal/pycore_ceval.h:46:12
  40: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5067:24
  41: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Objects/call.c:342:16
  42: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:114:11
  43: PyObject_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:123:12
  44: call_function
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5893:13
  45: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:4198:23
  46: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.18/Include/internal/pycore_ceval.h:46:12
  47: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5067:24
  48: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Objects/call.c:342:16
  49: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:114:11
  50: PyObject_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:123:12
  51: call_function
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5893:13
  52: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:4198:23
  53: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.18/Include/internal/pycore_ceval.h:46:12
  54: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5067:24
  55: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Objects/call.c:342:16
  56: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:114:11
  57: method_vectorcall
             at /usr/local/src/conda/python-3.10.18/Objects/classobject.c:61:20
  58: do_call_core
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5945:12
  59: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:4277:22
  60: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.18/Include/internal/pycore_ceval.h:46:12
  61: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5067:24
  62: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Objects/call.c:342:16
  63: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:114:11
  64: PyObject_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:123:12
  65: call_function
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5893:13
  66: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:4198:23
  67: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.18/Include/internal/pycore_ceval.h:46:12
  68: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5067:24
  69: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Objects/call.c:342:16
  70: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:114:11
  71: PyObject_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:123:12
  72: call_function
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5893:13
  73: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:4198:23
  74: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.18/Include/internal/pycore_ceval.h:46:12
  75: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.18/Python/ceval.c:5067:24
  76: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.18/Objects/call.c:342:16
  77: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.18/Include/cpython/abstract.h:114:11
  78: method_vectorcall
             at /usr/local/src/conda/python-3.10.18/Objects/classobject.c:61:20
  79: thread_run
             at /usr/local/src/conda/python-3.10.18/Modules/_threadmodule.c:1100:21
  80: pythread_wrapper
             at /usr/local/src/conda/python-3.10.18/Python/thread_pthread.h:248:5
  81: start_thread
             at /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434:8
  82: __clone3
             at /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81:0

thread '<unnamed>' panicked at hyperactor/src/reference.rs:652:14:

world_name() called on direct proc

note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

which we would see multiple times, every time we ran grpo main. I also see unit tests, i.e. in #409 that fail, but that PR had nothing to do with it. Checking the CI here to see if it's related.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 15, 2025
@joecummings
Copy link
Member

Can you share some more details about the current errors and how you bisected them to this commit?

@allenwang28 allenwang28 merged commit 633b219 into meta-pytorch:main Oct 15, 2025
9 checks passed
@allenwang28 allenwang28 deleted the revert_log branch October 15, 2025 22:53
felipemello1 pushed a commit to felipemello1/forge that referenced this pull request Oct 17, 2025
felipemello1 pushed a commit to felipemello1/forge that referenced this pull request Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants