Skip to content

Conversation

@makslevental
Copy link
Contributor

@makslevental makslevental commented Aug 23, 2025

Historical context: PyMlirContext::liveOperations was an optimization meant to cut down on the number of Python object allocations and (partially) a mechanism for updating validity of ops after transformation. E.g. during walking/transforming the AST. See original patch here.

Inspired by a renewed interest in #139721 (which has become a little stale...)

image

In the previous go-around (#92631) there were two issues which have been resolved

  1. ops that were "fetched" under a root op which has been transformed are no longer reported as invalid. We simply "formally forbid" this;

  2. Module._CAPICreate(module_capsule) must now be followed by a module._clear_mlir_module() to prevent double-freeing of the actual ModuleOp object (i.e. calling the dtor on the OwningOpRef<ModuleOp>):

    module = ...
    module_dup = Module._CAPICreate(module._CAPIPtr)
    module._clear_mlir_module()
    • the alternative choice here is to remove the Module._CAPICreate API altogether and replace it with something like Module._move(module) which will do both Module._CAPICreate and module._clear_mlir_module.

Note, the other approach I explored last year was a weakref system for mlir::Operation which would effectively hoist this liveOperations thing into MLIR core. Possibly doable but I now believe it's a bad idea.

The other potentially breaking change is is, which checks object equality rather than value equality, will now report False because we are always allocating new Python objects (ie that's the whole point of this change). Users wanting to check equality for Operation and Module should use ==.

cc @0xMihir @hawkinsp @superbobry @christopherbate @chhzh123

EDIT: Sanity check: no one is using Module._CAPICreate

@makslevental makslevental changed the title [mlir][python] wip remove liveOpeartions [mlir][python] remove liveOpeartions Aug 23, 2025
@makslevental makslevental force-pushed the remove_live_ops branch 8 times, most recently from 8c71176 to 2a0e205 Compare August 23, 2025 20:19
@makslevental makslevental requested a review from jpienaar August 23, 2025 20:30
@makslevental makslevental marked this pull request as ready for review August 23, 2025 20:31
@makslevental makslevental changed the title [mlir][python] remove liveOpeartions [mlir][python] remove liveOpeartions Aug 23, 2025
@llvmbot llvmbot added the mlir label Aug 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 23, 2025

@llvm/pr-subscribers-mlir

Author: Maksim Levental (makslevental)

Changes

Inspired by a renewed interest in #139721 (which has become a little stale...)

<p align="center">
<img width="504" height="375" alt="image" src="https://github.com/user-attachments/assets/0daad562-d3d1-4876-8d01-5dba382ab186" />
</p>

In the previous go-around (#92631) there were two issues which have been resolved

  1. ops that were "fetched" under a root op that gets transformed are no longer invalid. We "formally forbid" this.
  2. Module._CAPICreate(module_capsule) must now be followed by a module._clear_referrent() to prevent double-freeing of the actual ModuleOp object (i.e. calling the dtor on the OwningOpRef&lt;ModuleOp&gt;).

The other potentially breaking change is is is no longer supported (because we are allocating new Python objects now - ie that's the whole point of this change) and must be replaced with ==.

cc @0xMihir @hawkinsp @superbobry


Patch is 22.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155114.diff

10 Files Affected:

  • (modified) mlir/include/mlir-c/IR.h (+2)
  • (modified) mlir/lib/Bindings/Python/IRCore.cpp (+24-148)
  • (modified) mlir/lib/Bindings/Python/IRModule.h (+2-53)
  • (modified) mlir/lib/Bindings/Python/Pass.cpp (+2-6)
  • (modified) mlir/lib/Bindings/Python/TransformInterpreter.cpp (-1)
  • (modified) mlir/lib/CAPI/IR/IR.cpp (+4)
  • (modified) mlir/test/python/ir/module.py (+2-18)
  • (modified) mlir/test/python/ir/operation.py (+1-1)
  • (modified) mlir/test/python/ir/symbol_table.py (-8)
  • (modified) mlir/test/python/pass_manager.py (+1-47)
diff --git a/mlir/include/mlir-c/IR.h b/mlir/include/mlir-c/IR.h
index 71c7d4378677f..d20b46651383f 100644
--- a/mlir/include/mlir-c/IR.h
+++ b/mlir/include/mlir-c/IR.h
@@ -415,6 +415,8 @@ MLIR_CAPI_EXPORTED MlirOperation mlirModuleGetOperation(MlirModule module);
 /// The returned module is null when the input operation was not a ModuleOp.
 MLIR_CAPI_EXPORTED MlirModule mlirModuleFromOperation(MlirOperation op);
 
+MLIR_CAPI_EXPORTED bool mlirModuleEqual(MlirModule mod, MlirModule other);
+
 //===----------------------------------------------------------------------===//
 // Operation state.
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Bindings/Python/IRCore.cpp b/mlir/lib/Bindings/Python/IRCore.cpp
index 4b3a06cbce854..23c4ec7c7f944 100644
--- a/mlir/lib/Bindings/Python/IRCore.cpp
+++ b/mlir/lib/Bindings/Python/IRCore.cpp
@@ -702,84 +702,6 @@ size_t PyMlirContext::getLiveCount() {
   return getLiveContexts().size();
 }
 
-size_t PyMlirContext::getLiveOperationCount() {
-  nb::ft_lock_guard lock(liveOperationsMutex);
-  return liveOperations.size();
-}
-
-std::vector<PyOperation *> PyMlirContext::getLiveOperationObjects() {
-  std::vector<PyOperation *> liveObjects;
-  nb::ft_lock_guard lock(liveOperationsMutex);
-  for (auto &entry : liveOperations)
-    liveObjects.push_back(entry.second.second);
-  return liveObjects;
-}
-
-size_t PyMlirContext::clearLiveOperations() {
-
-  LiveOperationMap operations;
-  {
-    nb::ft_lock_guard lock(liveOperationsMutex);
-    std::swap(operations, liveOperations);
-  }
-  for (auto &op : operations)
-    op.second.second->setInvalid();
-  size_t numInvalidated = operations.size();
-  return numInvalidated;
-}
-
-void PyMlirContext::clearOperation(MlirOperation op) {
-  PyOperation *py_op;
-  {
-    nb::ft_lock_guard lock(liveOperationsMutex);
-    auto it = liveOperations.find(op.ptr);
-    if (it == liveOperations.end()) {
-      return;
-    }
-    py_op = it->second.second;
-    liveOperations.erase(it);
-  }
-  py_op->setInvalid();
-}
-
-void PyMlirContext::clearOperationsInside(PyOperationBase &op) {
-  typedef struct {
-    PyOperation &rootOp;
-    bool rootSeen;
-  } callBackData;
-  callBackData data{op.getOperation(), false};
-  // Mark all ops below the op that the passmanager will be rooted
-  // at (but not op itself - note the preorder) as invalid.
-  MlirOperationWalkCallback invalidatingCallback = [](MlirOperation op,
-                                                      void *userData) {
-    callBackData *data = static_cast<callBackData *>(userData);
-    if (LLVM_LIKELY(data->rootSeen))
-      data->rootOp.getOperation().getContext()->clearOperation(op);
-    else
-      data->rootSeen = true;
-    return MlirWalkResult::MlirWalkResultAdvance;
-  };
-  mlirOperationWalk(op.getOperation(), invalidatingCallback,
-                    static_cast<void *>(&data), MlirWalkPreOrder);
-}
-void PyMlirContext::clearOperationsInside(MlirOperation op) {
-  PyOperationRef opRef = PyOperation::forOperation(getRef(), op);
-  clearOperationsInside(opRef->getOperation());
-}
-
-void PyMlirContext::clearOperationAndInside(PyOperationBase &op) {
-  MlirOperationWalkCallback invalidatingCallback = [](MlirOperation op,
-                                                      void *userData) {
-    PyMlirContextRef &contextRef = *static_cast<PyMlirContextRef *>(userData);
-    contextRef->clearOperation(op);
-    return MlirWalkResult::MlirWalkResultAdvance;
-  };
-  mlirOperationWalk(op.getOperation(), invalidatingCallback,
-                    &op.getOperation().getContext(), MlirWalkPreOrder);
-}
-
-size_t PyMlirContext::getLiveModuleCount() { return liveModules.size(); }
-
 nb::object PyMlirContext::contextEnter(nb::object context) {
   return PyThreadContextEntry::pushContext(context);
 }
@@ -1151,38 +1073,20 @@ PyLocation &DefaultingPyLocation::resolve() {
 PyModule::PyModule(PyMlirContextRef contextRef, MlirModule module)
     : BaseContextObject(std::move(contextRef)), module(module) {}
 
-PyModule::~PyModule() {
-  nb::gil_scoped_acquire acquire;
-  auto &liveModules = getContext()->liveModules;
-  assert(liveModules.count(module.ptr) == 1 &&
-         "destroying module not in live map");
-  liveModules.erase(module.ptr);
-  mlirModuleDestroy(module);
-}
+PyModule::~PyModule() { mlirModuleDestroy(module); }
 
 PyModuleRef PyModule::forModule(MlirModule module) {
   MlirContext context = mlirModuleGetContext(module);
   PyMlirContextRef contextRef = PyMlirContext::forContext(context);
 
-  nb::gil_scoped_acquire acquire;
-  auto &liveModules = contextRef->liveModules;
-  auto it = liveModules.find(module.ptr);
-  if (it == liveModules.end()) {
-    // Create.
-    PyModule *unownedModule = new PyModule(std::move(contextRef), module);
-    // Note that the default return value policy on cast is automatic_reference,
-    // which does not take ownership (delete will not be called).
-    // Just be explicit.
-    nb::object pyRef = nb::cast(unownedModule, nb::rv_policy::take_ownership);
-    unownedModule->handle = pyRef;
-    liveModules[module.ptr] =
-        std::make_pair(unownedModule->handle, unownedModule);
-    return PyModuleRef(unownedModule, std::move(pyRef));
-  }
-  // Use existing.
-  PyModule *existing = it->second.second;
-  nb::object pyRef = nb::borrow<nb::object>(it->second.first);
-  return PyModuleRef(existing, std::move(pyRef));
+  // Create.
+  PyModule *unownedModule = new PyModule(std::move(contextRef), module);
+  // Note that the default return value policy on cast is automatic_reference,
+  // which does not take ownership (delete will not be called).
+  // Just be explicit.
+  nb::object pyRef = nb::cast(unownedModule, nb::rv_policy::take_ownership);
+  unownedModule->handle = pyRef;
+  return PyModuleRef(unownedModule, std::move(pyRef));
 }
 
 nb::object PyModule::createFromCapsule(nb::object capsule) {
@@ -1207,16 +1111,8 @@ PyOperation::~PyOperation() {
   // If the operation has already been invalidated there is nothing to do.
   if (!valid)
     return;
-
-  // Otherwise, invalidate the operation and remove it from live map when it is
-  // attached.
-  if (isAttached()) {
-    getContext()->clearOperation(*this);
-  } else {
-    // And destroy it when it is detached, i.e. owned by Python, in which case
-    // all nested operations must be invalidated at removed from the live map as
-    // well.
-    erase();
+  if (!isAttached()) {
+    mlirOperationDestroy(operation);
   }
 }
 
@@ -1252,35 +1148,16 @@ PyOperationRef PyOperation::createInstance(PyMlirContextRef contextRef,
 PyOperationRef PyOperation::forOperation(PyMlirContextRef contextRef,
                                          MlirOperation operation,
                                          nb::object parentKeepAlive) {
-  nb::ft_lock_guard lock(contextRef->liveOperationsMutex);
-  auto &liveOperations = contextRef->liveOperations;
-  auto it = liveOperations.find(operation.ptr);
-  if (it == liveOperations.end()) {
-    // Create.
-    PyOperationRef result = createInstance(std::move(contextRef), operation,
-                                           std::move(parentKeepAlive));
-    liveOperations[operation.ptr] =
-        std::make_pair(result.getObject(), result.get());
-    return result;
-  }
-  // Use existing.
-  PyOperation *existing = it->second.second;
-  nb::object pyRef = nb::borrow<nb::object>(it->second.first);
-  return PyOperationRef(existing, std::move(pyRef));
+  // Create.
+  return createInstance(std::move(contextRef), operation,
+                        std::move(parentKeepAlive));
 }
 
 PyOperationRef PyOperation::createDetached(PyMlirContextRef contextRef,
                                            MlirOperation operation,
                                            nb::object parentKeepAlive) {
-  nb::ft_lock_guard lock(contextRef->liveOperationsMutex);
-  auto &liveOperations = contextRef->liveOperations;
-  assert(liveOperations.count(operation.ptr) == 0 &&
-         "cannot create detached operation that already exists");
-  (void)liveOperations;
   PyOperationRef created = createInstance(std::move(contextRef), operation,
                                           std::move(parentKeepAlive));
-  liveOperations[operation.ptr] =
-      std::make_pair(created.getObject(), created.get());
   created->attached = false;
   return created;
 }
@@ -1652,7 +1529,6 @@ nb::object PyOperation::createOpView() {
 
 void PyOperation::erase() {
   checkValid();
-  getContext()->clearOperationAndInside(*this);
   mlirOperationDestroy(operation);
 }
 
@@ -3023,14 +2899,6 @@ void mlir::python::populateIRCore(nb::module_ &m) {
              PyMlirContextRef ref = PyMlirContext::forContext(self.get());
              return ref.releaseObject();
            })
-      .def("_get_live_operation_count", &PyMlirContext::getLiveOperationCount)
-      .def("_get_live_operation_objects",
-           &PyMlirContext::getLiveOperationObjects)
-      .def("_clear_live_operations", &PyMlirContext::clearLiveOperations)
-      .def("_clear_live_operations_inside",
-           nb::overload_cast<MlirOperation>(
-               &PyMlirContext::clearOperationsInside))
-      .def("_get_live_module_count", &PyMlirContext::getLiveModuleCount)
       .def_prop_ro(MLIR_PYTHON_CAPI_PTR_ATTR, &PyMlirContext::getCapsule)
       .def(MLIR_PYTHON_CAPI_FACTORY_ATTR, &PyMlirContext::createFromCapsule)
       .def("__enter__", &PyMlirContext::contextEnter)
@@ -3349,6 +3217,7 @@ void mlir::python::populateIRCore(nb::module_ &m) {
   nb::class_<PyModule>(m, "Module", nb::is_weak_referenceable())
       .def_prop_ro(MLIR_PYTHON_CAPI_PTR_ATTR, &PyModule::getCapsule)
       .def(MLIR_PYTHON_CAPI_FACTORY_ATTR, &PyModule::createFromCapsule)
+      .def("_clear_referrent", &PyModule::clearReferrent)
       .def_static(
           "parse",
           [](const std::string &moduleAsm, DefaultingPyMlirContext context) {
@@ -3428,7 +3297,13 @@ void mlir::python::populateIRCore(nb::module_ &m) {
             // Defer to the operation's __str__.
             return self.attr("operation").attr("__str__")();
           },
-          kOperationStrDunderDocstring);
+          kOperationStrDunderDocstring)
+      .def(
+          "__eq__",
+          [](PyModule &self, PyModule &other) {
+            return mlirModuleEqual(self.get(), other.get());
+          },
+          "other"_a);
 
   //----------------------------------------------------------------------------
   // Mapping of Operation.
@@ -3440,7 +3315,8 @@ void mlir::python::populateIRCore(nb::module_ &m) {
                    })
       .def("__eq__",
            [](PyOperationBase &self, PyOperationBase &other) {
-             return &self.getOperation() == &other.getOperation();
+             return mlirOperationEqual(self.getOperation().get(),
+                                       other.getOperation().get());
            })
       .def("__eq__",
            [](PyOperationBase &self, nb::object other) { return false; })
diff --git a/mlir/lib/Bindings/Python/IRModule.h b/mlir/lib/Bindings/Python/IRModule.h
index fa16ae3ce3294..e1a71c627a902 100644
--- a/mlir/lib/Bindings/Python/IRModule.h
+++ b/mlir/lib/Bindings/Python/IRModule.h
@@ -218,40 +218,6 @@ class PyMlirContext {
   /// Gets the count of live context objects. Used for testing.
   static size_t getLiveCount();
 
-  /// Get a list of Python objects which are still in the live context map.
-  std::vector<PyOperation *> getLiveOperationObjects();
-
-  /// Gets the count of live operations associated with this context.
-  /// Used for testing.
-  size_t getLiveOperationCount();
-
-  /// Clears the live operations map, returning the number of entries which were
-  /// invalidated. To be used as a safety mechanism so that API end-users can't
-  /// corrupt by holding references they shouldn't have accessed in the first
-  /// place.
-  size_t clearLiveOperations();
-
-  /// Removes an operation from the live operations map and sets it invalid.
-  /// This is useful for when some non-bindings code destroys the operation and
-  /// the bindings need to made aware. For example, in the case when pass
-  /// manager is run.
-  ///
-  /// Note that this does *NOT* clear the nested operations.
-  void clearOperation(MlirOperation op);
-
-  /// Clears all operations nested inside the given op using
-  /// `clearOperation(MlirOperation)`.
-  void clearOperationsInside(PyOperationBase &op);
-  void clearOperationsInside(MlirOperation op);
-
-  /// Clears the operaiton _and_ all operations inside using
-  /// `clearOperation(MlirOperation)`.
-  void clearOperationAndInside(PyOperationBase &op);
-
-  /// Gets the count of live modules associated with this context.
-  /// Used for testing.
-  size_t getLiveModuleCount();
-
   /// Enter and exit the context manager.
   static nanobind::object contextEnter(nanobind::object context);
   void contextExit(const nanobind::object &excType,
@@ -278,25 +244,6 @@ class PyMlirContext {
   static nanobind::ft_mutex live_contexts_mutex;
   static LiveContextMap &getLiveContexts();
 
-  // Interns all live modules associated with this context. Modules tracked
-  // in this map are valid. When a module is invalidated, it is removed
-  // from this map, and while it still exists as an instance, any
-  // attempt to access it will raise an error.
-  using LiveModuleMap =
-      llvm::DenseMap<const void *, std::pair<nanobind::handle, PyModule *>>;
-  LiveModuleMap liveModules;
-
-  // Interns all live operations associated with this context. Operations
-  // tracked in this map are valid. When an operation is invalidated, it is
-  // removed from this map, and while it still exists as an instance, any
-  // attempt to access it will raise an error.
-  using LiveOperationMap =
-      llvm::DenseMap<void *, std::pair<nanobind::handle, PyOperation *>>;
-  nanobind::ft_mutex liveOperationsMutex;
-
-  // Guarded by liveOperationsMutex in free-threading mode.
-  LiveOperationMap liveOperations;
-
   bool emitErrorDiagnostics = false;
 
   MlirContext context;
@@ -575,6 +522,8 @@ class PyModule : public BaseContextObject {
   /// is taken by calling this function.
   static nanobind::object createFromCapsule(nanobind::object capsule);
 
+  void clearReferrent() { module = {nullptr}; }
+
 private:
   PyModule(PyMlirContextRef contextRef, MlirModule module);
   MlirModule module;
diff --git a/mlir/lib/Bindings/Python/Pass.cpp b/mlir/lib/Bindings/Python/Pass.cpp
index 20017e25b69bb..817479ee2421b 100644
--- a/mlir/lib/Bindings/Python/Pass.cpp
+++ b/mlir/lib/Bindings/Python/Pass.cpp
@@ -159,11 +159,7 @@ void mlir::python::populatePassManagerSubmodule(nb::module_ &m) {
           "ValueError if the pipeline can't be parsed.")
       .def(
           "run",
-          [](PyPassManager &passManager, PyOperationBase &op,
-             bool invalidateOps) {
-            if (invalidateOps) {
-              op.getOperation().getContext()->clearOperationsInside(op);
-            }
+          [](PyPassManager &passManager, PyOperationBase &op) {
             // Actually run the pass manager.
             PyMlirContext::ErrorCapture errors(op.getOperation().getContext());
             MlirLogicalResult status = mlirPassManagerRunOnOp(
@@ -172,7 +168,7 @@ void mlir::python::populatePassManagerSubmodule(nb::module_ &m) {
               throw MLIRError("Failure while executing pass pipeline",
                               errors.take());
           },
-          "operation"_a, "invalidate_ops"_a = true,
+          "operation"_a,
           "Run the pass manager on the provided operation, raising an "
           "MLIRError on failure.")
       .def(
diff --git a/mlir/lib/Bindings/Python/TransformInterpreter.cpp b/mlir/lib/Bindings/Python/TransformInterpreter.cpp
index f9b0fed62778f..920bca886f617 100644
--- a/mlir/lib/Bindings/Python/TransformInterpreter.cpp
+++ b/mlir/lib/Bindings/Python/TransformInterpreter.cpp
@@ -67,7 +67,6 @@ static void populateTransformInterpreterSubmodule(nb::module_ &m) {
         // root. This is awkward, but we don't have access to PyMlirContext
         // object here otherwise.
         nb::object obj = nb::cast(payloadRoot);
-        obj.attr("context").attr("_clear_live_operations_inside")(payloadRoot);
 
         MlirLogicalResult result = mlirTransformApplyNamedSequence(
             payloadRoot, transformRoot, transformModule, options.options);
diff --git a/mlir/lib/CAPI/IR/IR.cpp b/mlir/lib/CAPI/IR/IR.cpp
index 8491553dab76f..7ea799c836889 100644
--- a/mlir/lib/CAPI/IR/IR.cpp
+++ b/mlir/lib/CAPI/IR/IR.cpp
@@ -465,6 +465,10 @@ MlirModule mlirModuleFromOperation(MlirOperation op) {
   return wrap(dyn_cast<ModuleOp>(unwrap(op)));
 }
 
+bool mlirModuleEqual(MlirModule mod, MlirModule other) {
+  return unwrap(mod) == unwrap(other);
+}
+
 //===----------------------------------------------------------------------===//
 // Operation state API.
 //===----------------------------------------------------------------------===//
diff --git a/mlir/test/python/ir/module.py b/mlir/test/python/ir/module.py
index 6065e59fd6ed9..4a9cf9d44a8aa 100644
--- a/mlir/test/python/ir/module.py
+++ b/mlir/test/python/ir/module.py
@@ -121,27 +121,16 @@ def testRoundtripBinary():
 def testModuleOperation():
     ctx = Context()
     module = Module.parse(r"""module @successfulParse {}""", ctx)
-    assert ctx._get_live_module_count() == 1
     op1 = module.operation
-    assert ctx._get_live_operation_count() == 1
-    live_ops = ctx._get_live_operation_objects()
-    assert len(live_ops) == 1
-    assert live_ops[0] is op1
-    live_ops = None
     # CHECK: module @successfulParse
     print(op1)
 
     # Ensure that operations are the same on multiple calls.
     op2 = module.operation
-    assert ctx._get_live_operation_count() == 1
-    assert op1 is op2
+    assert op1 == op2
 
     # Test live operation clearing.
     op1 = module.operation
-    assert ctx._get_live_operation_count() == 1
-    num_invalidated = ctx._clear_live_operations()
-    assert num_invalidated == 1
-    assert ctx._get_live_operation_count() == 0
     op1 = None
     gc.collect()
     op1 = module.operation
@@ -155,9 +144,6 @@ def testModuleOperation():
     op1 = None
     op2 = None
     gc.collect()
-    print("LIVE OPERATIONS:", ctx._get_live_operation_count())
-    assert ctx._get_live_operation_count() == 0
-    assert ctx._get_live_module_count() == 0
 
 
 # CHECK-LABEL: TEST: testModuleCapsule
@@ -165,16 +151,14 @@ def testModuleOperation():
 def testModuleCapsule():
     ctx = Context()
     module = Module.parse(r"""module @successfulParse {}""", ctx)
-    assert ctx._get_live_module_count() == 1
     # CHECK: "mlir.ir.Module._CAPIPtr"
     module_capsule = module._CAPIPtr
     print(module_capsule)
     module_dup = Module._CAPICreate(module_capsule)
-    assert module is module_dup
+    module._clear_referrent()
     assert module_dup.context is ctx
     # Gc and verify destructed.
     module = None
     module_capsule = None
     module_dup = None
     gc.collect()
-    assert ctx._get_live_module_count() == 0
diff --git a/mlir/test/python/ir/operation.py b/mlir/test/python/ir/operation.py
index bf16e3f75d60d..c95a97b9731a2 100644
--- a/mlir/test/python/ir/operation.py
+++ b/mlir/test/python/ir/operation.py
@@ -907,7 +907,7 @@ def testCapsuleConversions():
         m_capsule = m._CAPIPtr
         assert '"mlir.ir.Operation._CAPIPtr"' in repr(m_capsule)
         m2 = Operation._CAPICreate(m_capsule)
-        assert m2 is m
+        assert m2 == m
 
 
 # CHECK-LABEL: TEST: testOperationErase
diff --git a/mlir/test/python/ir/symbol_table.py b/mlir/test/python/ir/symbol_table.py
index 8b6d7ea5a197d..7afd539271d21 100644
--- a/mlir/test/python/ir/symbol_table.py
+++ b/mlir/test/python/ir/symbol_table.py
@@ -56,14 +56,6 @@ def testSymbolTableInsert():
         print(m1)
         assert "bar" not in symbol_table
 
-        try:
-            print(bar)
-        except RuntimeError as e:
-            if "the operation has been invalidated" not in str(e):
-                raise
-        else:
-            assert False, "expected RuntimeError due to invalidated operation"
-
         qux = m2.body.operations[0]
         m1.body.append(qux)
         symbol_table.insert(qux)
diff --git a/mlir/test/python/pass_manager.py b/mlir/test/python/pass_manager.py
index e26d42bb32913..0896cd9784641 100644
--- a/mlir/test/python/pass_manager.py
...
[truncated]

@joker-eph joker-eph changed the title [mlir][python] remove liveOpeartions [mlir][python] remove liveOperations Aug 23, 2025
@makslevental makslevental force-pushed the remove_live_ops branch 3 times, most recently from 3de0c41 to b32e0f3 Compare August 23, 2025 21:12
@makslevental makslevental changed the title [mlir][python] remove liveOperations [MLIR][Python] remove liveOperations Aug 23, 2025
@makslevental makslevental force-pushed the remove_live_ops branch 5 times, most recently from 6041b0c to 266540a Compare August 23, 2025 22:00
@makslevental
Copy link
Contributor Author

I kicked off a manual build on sanitizer-x86_64-linux-bootstrap-asan

@makslevental makslevental merged commit b2a7369 into llvm:main Sep 2, 2025
9 checks passed
@makslevental makslevental deleted the remove_live_ops branch September 2, 2025 04:53
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 2, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-asan running on sanitizer-buildbot7 while building mlir at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/12189

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 90269 tests, 72 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: tools/llvm-exegesis/AArch64/error-resolution.s (77122 of 90269)
******************** TEST 'LLVM :: tools/llvm-exegesis/AArch64/error-resolution.s' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency  --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_latency # RUN: at line 8
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_throughput # RUN: at line 9
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_throughput
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency  --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_latency # RUN: at line 24
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_throughput # RUN: at line 25
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_throughput
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency  --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_latency # RUN: at line 43
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_throughput # RUN: at line 44
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_throughput
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=UMOVvi16_idx0 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=UMOVvi16_idx0_latency # RUN: at line 61
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=UMOVvi16_idx0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=UMOVvi16_idx0_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s:65:26: error: UMOVvi16_idx0_latency: expected string not found in input
# UMOVvi16_idx0_latency: ---
                         ^
<stdin>:1:1: note: scanning from here
UMOVvi16_idx0: Not all operands were initialized by the snippet generator for LD4i8_POST opcode.
^

Input file: <stdin>
Check file: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s

-dump-input=help explains the following input dump.

Step 11 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 90269 tests, 72 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: tools/llvm-exegesis/AArch64/error-resolution.s (77122 of 90269)
******************** TEST 'LLVM :: tools/llvm-exegesis/AArch64/error-resolution.s' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency  --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_latency # RUN: at line 8
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_throughput # RUN: at line 9
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_throughput
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency  --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_latency # RUN: at line 24
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_throughput # RUN: at line 25
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_throughput
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency  --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_latency # RUN: at line 43
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_throughput # RUN: at line 44
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_throughput
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=UMOVvi16_idx0 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=UMOVvi16_idx0_latency # RUN: at line 61
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=UMOVvi16_idx0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=UMOVvi16_idx0_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s:65:26: error: UMOVvi16_idx0_latency: expected string not found in input
# UMOVvi16_idx0_latency: ---
                         ^
<stdin>:1:1: note: scanning from here
UMOVvi16_idx0: Not all operands were initialized by the snippet generator for LD4i8_POST opcode.
^

Input file: <stdin>
Check file: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s

-dump-input=help explains the following input dump.

Step 14 (stage3/asan check) failure: stage3/asan check (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:527: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86960 tests, 72 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80..
FAIL: LLVM :: tools/llvm-exegesis/AArch64/error-resolution.s (77121 of 86960)
******************** TEST 'LLVM :: tools/llvm-exegesis/AArch64/error-resolution.s' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency  --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_latency # RUN: at line 8
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_throughput # RUN: at line 9
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv4s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv4s_msl_throughput
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency  --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_latency # RUN: at line 24
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_throughput # RUN: at line 25
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=MOVIv2s_msl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=MOVIv2s_msl_throughput
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency  --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_latency # RUN: at line 43
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_throughput # RUN: at line 44
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=inverse_throughput --benchmark-phase=prepare-and-assemble-snippet --opcode-name=LDRDl
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=LDRDl_throughput
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=UMOVvi16_idx0 2>&1 | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=UMOVvi16_idx0_latency # RUN: at line 61
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/llvm-exegesis --mtriple=aarch64 --mcpu=neoverse-v2 --mode=latency --benchmark-phase=prepare-and-assemble-snippet --opcode-name=UMOVvi16_idx0
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s --check-prefix=UMOVvi16_idx0_latency
/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s:65:26: error: UMOVvi16_idx0_latency: expected string not found in input
# UMOVvi16_idx0_latency: ---
                         ^
<stdin>:1:1: note: scanning from here
UMOVvi16_idx0: Not all operands were initialized by the snippet generator for LD4B_IMM opcode.
^

Input file: <stdin>
Check file: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/tools/llvm-exegesis/AArch64/error-resolution.s

-dump-input=help explains the following input dump.


@Cubevoid
Copy link
Contributor

Cubevoid commented Sep 2, 2025

I found that python dicts using operations as keys are now broken because even though the operations are equal, they do not have the same hash. Should we override the __hash__ method to make sure this still works?

@makslevental
Copy link
Contributor Author

I found that python dicts using operations as keys are now broken because even though the operations are equal, they do not have the same hash. Should we override the hash method to make sure this still works?

Good catch. Yes let me do that right now.

makslevental added a commit that referenced this pull request Sep 2, 2025
#155114 broke op hashing
(because the python objects ceased to be reference equivalent). This PR
fixes by binding `OperationEquivalence::computeHash`.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 2, 2025
llvm/llvm-project#155114 broke op hashing
(because the python objects ceased to be reference equivalent). This PR
fixes by binding `OperationEquivalence::computeHash`.
bump-llvm bot pushed a commit to makslevental/python_bindings_fork that referenced this pull request Sep 3, 2025
llvm/llvm-project#155114 broke op hashing
(because the python objects ceased to be reference equivalent). This PR
fixes by binding `OperationEquivalence::computeHash`.
@teqdruid
Copy link
Contributor

This broke me downstream! (PyCDE in the CIRCT repo.)

I was calling _clear_live_operations any time I ran a C++ pass which may have erased operations. Yes, keeping around Python references to deleted C++ objects (operations, specifically) is invalid and a bug. But crashing on a Python API call is not great behavior even if there is a bug in one's code. It's even more annoying to debug Python code which has an old reference to an operation which has been deleted but a new operation has been allocated into the same address, so it doesn't crash and just provides "incorrect" data via the Python debugger! Speaking of the debugger, when a Python program crashes it takes the Python debugger with it (IIRC) which makes for a really fun experience when clicking through the Python object tree in VSCode. I don't see this as acceptable behavior.

So, I was using _clear_live_operations to set the valid flag in PyOperation to false to signal that the Python Operations were no longer valid so as to trigger reasonable behavior when debugging. I was also issuing a warning when it returned a non-zero result. Without this, I have no idea if there's a dangling pointer bug which could explode if I touch (or click) something the wrong way.

I was not using (as in dependent on) the caching behavior of the liveOperations map.

Is there any way for me to get this debugging feature back without a liveOperations map? We'd only have to track the PyOperations which have been allocated but not yet invalidated or destructed.

@makslevental
Copy link
Contributor Author

makslevental commented Sep 24, 2025

So, I was using _clear_live_operations to set the valid flag in PyOperation to false to signal that the Python Operations were no longer valid so as to trigger reasonable behavior when debugging. I was also issuing a warning when it returned a non-zero result. Without this, I have no idea if there's a dangling pointer bug which could explode if I touch (or click) something the wrong way.

I added an API _set_invalid with use-cases like yours in mind (but not yours specifically since I wasn't aware of it):

func_op._set_invalid()

So you can invalidate ops by hand. Now this requires actually being able to "find" them i.e., they should already have been instantiated on the Python side and be accessible/reachable by you at the point at which you return from the call to the pass manager. The first part was always a requirement (the liveOperations operations map would only have had in it ops that were already "live" right) but second part might be challenging.

@teqdruid
Copy link
Contributor

and be accessible/reachable by you at the point at which you return from the call to the pass manager.

Yes, this is the trick. I don't have a centralized, fool-proof way of tracking all of the operations which get created. liveOperations gave me that.

Given my statement regarding debug-ability above, is there any interest in adding this to the MLIR Python bindings themselves? Or am I the only one who has scars from this issue?

We'd just have to track the set of all Operations which have been created (either constructed from Python or by the bindings while crawling the IR) and haven't been deconstructed yet. Seems like that would be easy in the C++ bindings, wouldn't take up more space than liveOperations, and invalidations would be the responsibility of the user.

@makslevental
Copy link
Contributor Author

makslevental commented Sep 24, 2025

We'd just have to track the set of all Operations which have been created (either constructed from Python or by the bindings while crawling the IR) and haven't been deconstructed yet. Seems like that would be easy in the C++ bindings, wouldn't take up more space than liveOperations, and invalidations would be the responsibility of the user.

The whole reason for removing liveOperations was because keeping it up to date caused lots of issues: #139721, #93339, #69730, #92344, #63916. So while I'm generally supportive of sharp edges which are the user's responsibility to manage, this one seems just fundamentally unsustainable.

I can suggest two workarounds:

  1. You should actually be able to recover all currently live operations/opviews using something like
    import gc
    for obj in gc.get_objects():
        if isinstance(obj, (Operation, OpView)):
            print(obj)
    but actually it's not working for me right now (gc.is_tracked returns False for all ops/opviews and I don't understand why). I will try again tomorrow.
  2. You can monkey-patch OpView.__init__ to both init your opview and add it to a global something:
    my_live_operation_map = set()
    old_init = OpView.__init__
    def __init__(self, *args, **kwargs):
        old_init(self, *args, **kwargs)
        my_live_operation_map.add(self)
     OpView.__init__ = __init__
    Assuming you're loading dialects correctly this will work both for OpViews you create in python yourself (i.e., from circt_ops_gen import Op1; op = Op1()) and opviews created for you by the bindings (e.g., by doing op.operation.opview or any of the other APIs that ultimately call PyOperation::createOpView()).

Or am I the only one who has scars from this issue?

You're the first to complain 😄 but even if you are the only one, I/we broke you and that's bad. So if these workarounds aren't sufficient then I can keep pondering this and if worse comes to worst we can restore liveOperationsMap in some limited form/fashion. But just reiterating: I would very much prefer to keep it fully removed.

@makslevental
Copy link
Contributor Author

makslevental commented Sep 24, 2025

Re using gc.get_objects(): this works (copy-pasta from SO):

def _getr(slist, olist, seen):
    for e in slist:
        if id(e) in seen:
            continue
        seen[id(e)] = None
        olist.append(e)
        tl = gc.get_referents(e)
        if tl:
            _getr(tl, olist, seen)

def get_all_objects():
    gcl = gc.get_objects()
    olist = []
    seen = {}
    seen[id(gcl)] = None
    seen[id(olist)] = None
    seen[id(seen)] = None
    _getr(gcl, olist, seen)
    return olist

ops = [op1, op2, op3]

for obj in get_all_objects():
    if isinstance(obj, (Operation, OpView)):
        print(obj)

EDIT: apparently there's a blessed way to do this https://docs.python.org/3/library/sys.html#sys.getobjects but it's behind a compile flag for Python itself. More info about this problem here (linked in the SO).

@teqdruid
Copy link
Contributor

Going through all of the objects in Python just to find the Operations/OpViews seems a bit heavy-handed. If I do anything, it'll be to monkey-patch as above or find some other way to track them on creation and destruction.

The whole reason for removing liveOperations was because keeping it up to date caused lots of issues: #139721, #93339, #69730, #92344, #63916. So while I'm generally supportive of sharp edges which are the user's responsibility to manage, this one seems just fundamentally unsustainable.

I don't think closing these as "fixed" by claiming this behavior is correct by the new rules is accurate. More of a "wontfix". It's essentially defining the problem away. IMO any Python API call which crashes the interpreter is a legitimate bug and this doesn't address them at all.

My opinion is that we need to be conservative by default (endeavor to never crash at the cost of occasionally disallowing potentially valid things) and let users opt-in to behavior which may cause crashes. For instance, we should invalidate all PyOperations whenever a pass is run (by default). We should invalidate all children of erased ops before entering MLIR land by default. Anything which causes the Python bindings to crash should be treated as a bug which should be addressed -- regardless of the difficulty of fixing or maintenance of the fix. If users know better and want to manage valid state in an application/pass-specific way, they should need to opt-in to that at the Context granularity.

From previous discussion on this issue, @ftynse wrote in #93339:

I was thinking to have a hook in Operation::erase, with the callback itself stored in the context, that could be used to update an externalized weak reference table. When not in use, the cost is just one pointer in the context + a branch in the destructor. When in use, the cost would be higher than just having references in the operation I suppose, depending on the actual implementaiton of the table but an indirect call is also not free.

I proposed exactly this a few years ago. (Can't remember exactly when, but I wrote about the feedback here.) Something like this would probably help not just with Python but all managed-memory-language bindings. I continue to think this is a good idea and a mechanism like this is necessary to make pointers "safe" for other languages. Maybe it's time to re-broach this idea?

@makslevental
Copy link
Contributor Author

makslevental commented Sep 25, 2025

I don't think closing these as "fixed" by claiming this behavior is correct by the new rules is accurate. More of a "wontfix". It's essentially defining the problem away. IMO any Python API call which crashes the interpreter is a legitimate bug and this doesn't address them at all.

I mean only one of these is like that (#63916) - the rest were race conditions/bugs/etc. that were patched/re-patched/re-considered over and over.

My opinion is that we need to be conservative by default (endeavor to never crash at the cost of occasionally disallowing potentially valid things) and let users opt-in to behavior which may cause crashes.

This is GC vs no-GC argument in different clothing. The problem is we don't have a choice here - C++ has already made the choice for us right? In C++ code users (us devs or whomever) can hold invalidated references and there's no attempt made whatsoever to protect them from it (cf the dreaded <<NULL VALUE>> or worse) because that's just the understanding around manual memory management in C++. The fact that this percolates up through the bindings isn't due to a lack of care but due to a lack of lower-level facilities - there is no easy way to canonically know when/where/how an object created in C++ land has been destroyed. My conclusion is: how could the bindings be safer than the C++ itself?

For instance, we should invalidate all PyOperations whenever a pass is run (by default). We should invalidate all children of erased ops before entering MLIR land by default. Anything which causes the Python bindings to crash should be treated as a bug which should be addressed -- regardless of the difficulty of fixing or maintenance of the fix. If users know better and want to manage valid state in an application/pass-specific way, they should need to opt-in to that at the Context granularity.

The bindings aren't designed to be used the way you're using them though - they weren't designed for people to be able to run passes and then still be in a valid state. What's been drilled into my head over the years is they were conceived of and designed only to support emitting IR (i.e., a nicer test writing UX). So we're already where you want to be - just assume they are all already invalid always and don't try to do what you're trying to do 😄. Now certainly the bindings have evolved (e.g., thanks to you, thanks to many others) and they do support some light "industrial use" (rewrites and such) but the founding premise is still there - they're not meant to be safe for this kind of use. And just re-stating the above: the user in this instance isn't a DSL user - it's you the compiler developer and you're writing library code (not user code). So I think the expectations are the same as for C++; i.e., the Python library code you write needs to be as conservative as it would need to be if it were just C++ library code.

From previous discussion on this issue, @ftynse wrote in ...

I actually did sketch that out @ #97340. If you take a look you'll notice how much starts to change when you try to implement this - specifically everywhere you return an Operation* you now need to return a weakref instead. I personally felt it was just too much work for not enough ROI (and also that core wouldn't take the PR). If you want to try to RFC that idea (weakref system) I could probably finish up an MVP but I highly doubt the RFC will be approved 🤷.

@makslevental
Copy link
Contributor Author

I found a third workaround: we can enlist the aid of nanobind, which itself of course keeps precise records (no pun intended) on which nanobinded types have been instantiated, and query for currently "live operations". Demo here #160632. Note, not for the faint of heart (uses nb_internals.h).

@teqdruid
Copy link
Contributor

From previous discussion on this issue, @ftynse wrote in ...

I actually did sketch that out @ #97340. If you take a look you'll notice how much starts to change when you try to implement this - specifically everywhere you return an Operation* you now need to return a weakref instead. I personally felt it was just too much work for not enough ROI (and also that core wouldn't take the PR). If you want to try to RFC that idea (weakref system) I could probably finish up an MVP but I highly doubt the RFC will be approved 🤷.

That looks like what @stellaraccident was talking about rather than the proposal I had. My proposal is nowhere near as invasive. Simply a callback which one could register in MLIRContext (and plumbed through the CAPI) which would be called when an operation is destructed (pointer is no longer valid). We could use it in the Python bindings to invalidate the PyOperation(s) which are fronting the erased operation.

The bindings aren't designed to be used the way you're using them though - they weren't designed for people to be able to run passes and then still be in a valid state. What's been drilled into my head over the years is they were conceived of and designed only to support emitting IR (i.e., a nicer test writing UX). So we're already where you want to be - just assume they are all already invalid always and don't try to do what you're trying to do 😄. Now certainly the bindings have evolved (e.g., thanks to you, thanks to many others) and they do support some light "industrial use" (rewrites and such) but the founding premise is still there - they're not meant to be safe for this kind of use. And just re-stating the above: the user in this instance isn't a DSL user - it's you the compiler developer and you're writing library code (not user code). So I think the expectations are the same as for C++; i.e., the Python library code you write needs to be as conservative as it would need to be if it were just C++ library code.

I think you're missing my point. It doesn't matter how the bindings are supposed to be used. (I've pivoted to using them "the right way".) When they're used improperly, the correct behavior is to throw an exception rather than just crash. This is the expectation for all Python libraries.

... I wonder if it's possible to "catch" a crash in C++ land and translate it to a Python exception. That would also fix this problem and many others. Yeah, the C++ side is possibly not valid anymore but what's the worst which can happen if you call into it? Another crash which would be caught?

@stellaraccident
Copy link
Contributor

In the first part where you plussed me in, I almost stayed lurking, but you nerd sniped me with the last paragraph :)

I don't think you can catch this kind of crash. Maybe if you were just dealing with segmentation faults due to null pointers (this is how things like Java work: translating such segfaults into NPE). But what you are actually dealing with is stale pointers into random parts of the heap. These may even be valid allocations, because the way the heap gets used in mlir is pretty deterministic in a lot of cases. So you'd just be randomly stomping memory, sometimes doing dirty reads, sometimes doing stray writes and sometimes deallocating memory owned by something else. No catching any of that -- you just need to not do such things.

Most of the time, it's time to acknowledge a design flaw when you start going down the path of implementing some kind of pointer tracking GC :)

@teqdruid
Copy link
Contributor

I don't think you can catch this kind of crash. Maybe if you were just dealing with segmentation faults due to null pointers (this is how things like Java work: translating such segfaults into NPE). But what you are actually dealing with is stale pointers into random parts of the heap. These may even be valid allocations, because the way the heap gets used in mlir is pretty deterministic in a lot of cases. So you'd just be randomly stomping memory, sometimes doing dirty reads, sometimes doing stray writes and sometimes deallocating memory owned by something else. No catching any of that -- you just need to not do such things

Yeah, I realized that like 15 minutes after posting that thought.

Most of the time, it's time to acknowledge a design flaw when you start going down the path of implementing some kind of pointer tracking GC :)

I recognize that. As a thought experiment, what would be a better design to make the MLIR Python bindings "safe"/"robust" to this sort of behavior? I can't think of one though I'm not that creative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants