-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[MLIR][Python] remove liveOperations
#155114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8c71176 to
2a0e205
Compare
liveOpeartions
|
@llvm/pr-subscribers-mlir Author: Maksim Levental (makslevental) ChangesInspired by a renewed interest in #139721 (which has become a little stale...) <p align="center"> In the previous go-around (#92631) there were two issues which have been resolved
The other potentially breaking change is cc @0xMihir @hawkinsp @superbobry Patch is 22.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155114.diff 10 Files Affected:
diff --git a/mlir/include/mlir-c/IR.h b/mlir/include/mlir-c/IR.h
index 71c7d4378677f..d20b46651383f 100644
--- a/mlir/include/mlir-c/IR.h
+++ b/mlir/include/mlir-c/IR.h
@@ -415,6 +415,8 @@ MLIR_CAPI_EXPORTED MlirOperation mlirModuleGetOperation(MlirModule module);
/// The returned module is null when the input operation was not a ModuleOp.
MLIR_CAPI_EXPORTED MlirModule mlirModuleFromOperation(MlirOperation op);
+MLIR_CAPI_EXPORTED bool mlirModuleEqual(MlirModule mod, MlirModule other);
+
//===----------------------------------------------------------------------===//
// Operation state.
//===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Bindings/Python/IRCore.cpp b/mlir/lib/Bindings/Python/IRCore.cpp
index 4b3a06cbce854..23c4ec7c7f944 100644
--- a/mlir/lib/Bindings/Python/IRCore.cpp
+++ b/mlir/lib/Bindings/Python/IRCore.cpp
@@ -702,84 +702,6 @@ size_t PyMlirContext::getLiveCount() {
return getLiveContexts().size();
}
-size_t PyMlirContext::getLiveOperationCount() {
- nb::ft_lock_guard lock(liveOperationsMutex);
- return liveOperations.size();
-}
-
-std::vector<PyOperation *> PyMlirContext::getLiveOperationObjects() {
- std::vector<PyOperation *> liveObjects;
- nb::ft_lock_guard lock(liveOperationsMutex);
- for (auto &entry : liveOperations)
- liveObjects.push_back(entry.second.second);
- return liveObjects;
-}
-
-size_t PyMlirContext::clearLiveOperations() {
-
- LiveOperationMap operations;
- {
- nb::ft_lock_guard lock(liveOperationsMutex);
- std::swap(operations, liveOperations);
- }
- for (auto &op : operations)
- op.second.second->setInvalid();
- size_t numInvalidated = operations.size();
- return numInvalidated;
-}
-
-void PyMlirContext::clearOperation(MlirOperation op) {
- PyOperation *py_op;
- {
- nb::ft_lock_guard lock(liveOperationsMutex);
- auto it = liveOperations.find(op.ptr);
- if (it == liveOperations.end()) {
- return;
- }
- py_op = it->second.second;
- liveOperations.erase(it);
- }
- py_op->setInvalid();
-}
-
-void PyMlirContext::clearOperationsInside(PyOperationBase &op) {
- typedef struct {
- PyOperation &rootOp;
- bool rootSeen;
- } callBackData;
- callBackData data{op.getOperation(), false};
- // Mark all ops below the op that the passmanager will be rooted
- // at (but not op itself - note the preorder) as invalid.
- MlirOperationWalkCallback invalidatingCallback = [](MlirOperation op,
- void *userData) {
- callBackData *data = static_cast<callBackData *>(userData);
- if (LLVM_LIKELY(data->rootSeen))
- data->rootOp.getOperation().getContext()->clearOperation(op);
- else
- data->rootSeen = true;
- return MlirWalkResult::MlirWalkResultAdvance;
- };
- mlirOperationWalk(op.getOperation(), invalidatingCallback,
- static_cast<void *>(&data), MlirWalkPreOrder);
-}
-void PyMlirContext::clearOperationsInside(MlirOperation op) {
- PyOperationRef opRef = PyOperation::forOperation(getRef(), op);
- clearOperationsInside(opRef->getOperation());
-}
-
-void PyMlirContext::clearOperationAndInside(PyOperationBase &op) {
- MlirOperationWalkCallback invalidatingCallback = [](MlirOperation op,
- void *userData) {
- PyMlirContextRef &contextRef = *static_cast<PyMlirContextRef *>(userData);
- contextRef->clearOperation(op);
- return MlirWalkResult::MlirWalkResultAdvance;
- };
- mlirOperationWalk(op.getOperation(), invalidatingCallback,
- &op.getOperation().getContext(), MlirWalkPreOrder);
-}
-
-size_t PyMlirContext::getLiveModuleCount() { return liveModules.size(); }
-
nb::object PyMlirContext::contextEnter(nb::object context) {
return PyThreadContextEntry::pushContext(context);
}
@@ -1151,38 +1073,20 @@ PyLocation &DefaultingPyLocation::resolve() {
PyModule::PyModule(PyMlirContextRef contextRef, MlirModule module)
: BaseContextObject(std::move(contextRef)), module(module) {}
-PyModule::~PyModule() {
- nb::gil_scoped_acquire acquire;
- auto &liveModules = getContext()->liveModules;
- assert(liveModules.count(module.ptr) == 1 &&
- "destroying module not in live map");
- liveModules.erase(module.ptr);
- mlirModuleDestroy(module);
-}
+PyModule::~PyModule() { mlirModuleDestroy(module); }
PyModuleRef PyModule::forModule(MlirModule module) {
MlirContext context = mlirModuleGetContext(module);
PyMlirContextRef contextRef = PyMlirContext::forContext(context);
- nb::gil_scoped_acquire acquire;
- auto &liveModules = contextRef->liveModules;
- auto it = liveModules.find(module.ptr);
- if (it == liveModules.end()) {
- // Create.
- PyModule *unownedModule = new PyModule(std::move(contextRef), module);
- // Note that the default return value policy on cast is automatic_reference,
- // which does not take ownership (delete will not be called).
- // Just be explicit.
- nb::object pyRef = nb::cast(unownedModule, nb::rv_policy::take_ownership);
- unownedModule->handle = pyRef;
- liveModules[module.ptr] =
- std::make_pair(unownedModule->handle, unownedModule);
- return PyModuleRef(unownedModule, std::move(pyRef));
- }
- // Use existing.
- PyModule *existing = it->second.second;
- nb::object pyRef = nb::borrow<nb::object>(it->second.first);
- return PyModuleRef(existing, std::move(pyRef));
+ // Create.
+ PyModule *unownedModule = new PyModule(std::move(contextRef), module);
+ // Note that the default return value policy on cast is automatic_reference,
+ // which does not take ownership (delete will not be called).
+ // Just be explicit.
+ nb::object pyRef = nb::cast(unownedModule, nb::rv_policy::take_ownership);
+ unownedModule->handle = pyRef;
+ return PyModuleRef(unownedModule, std::move(pyRef));
}
nb::object PyModule::createFromCapsule(nb::object capsule) {
@@ -1207,16 +1111,8 @@ PyOperation::~PyOperation() {
// If the operation has already been invalidated there is nothing to do.
if (!valid)
return;
-
- // Otherwise, invalidate the operation and remove it from live map when it is
- // attached.
- if (isAttached()) {
- getContext()->clearOperation(*this);
- } else {
- // And destroy it when it is detached, i.e. owned by Python, in which case
- // all nested operations must be invalidated at removed from the live map as
- // well.
- erase();
+ if (!isAttached()) {
+ mlirOperationDestroy(operation);
}
}
@@ -1252,35 +1148,16 @@ PyOperationRef PyOperation::createInstance(PyMlirContextRef contextRef,
PyOperationRef PyOperation::forOperation(PyMlirContextRef contextRef,
MlirOperation operation,
nb::object parentKeepAlive) {
- nb::ft_lock_guard lock(contextRef->liveOperationsMutex);
- auto &liveOperations = contextRef->liveOperations;
- auto it = liveOperations.find(operation.ptr);
- if (it == liveOperations.end()) {
- // Create.
- PyOperationRef result = createInstance(std::move(contextRef), operation,
- std::move(parentKeepAlive));
- liveOperations[operation.ptr] =
- std::make_pair(result.getObject(), result.get());
- return result;
- }
- // Use existing.
- PyOperation *existing = it->second.second;
- nb::object pyRef = nb::borrow<nb::object>(it->second.first);
- return PyOperationRef(existing, std::move(pyRef));
+ // Create.
+ return createInstance(std::move(contextRef), operation,
+ std::move(parentKeepAlive));
}
PyOperationRef PyOperation::createDetached(PyMlirContextRef contextRef,
MlirOperation operation,
nb::object parentKeepAlive) {
- nb::ft_lock_guard lock(contextRef->liveOperationsMutex);
- auto &liveOperations = contextRef->liveOperations;
- assert(liveOperations.count(operation.ptr) == 0 &&
- "cannot create detached operation that already exists");
- (void)liveOperations;
PyOperationRef created = createInstance(std::move(contextRef), operation,
std::move(parentKeepAlive));
- liveOperations[operation.ptr] =
- std::make_pair(created.getObject(), created.get());
created->attached = false;
return created;
}
@@ -1652,7 +1529,6 @@ nb::object PyOperation::createOpView() {
void PyOperation::erase() {
checkValid();
- getContext()->clearOperationAndInside(*this);
mlirOperationDestroy(operation);
}
@@ -3023,14 +2899,6 @@ void mlir::python::populateIRCore(nb::module_ &m) {
PyMlirContextRef ref = PyMlirContext::forContext(self.get());
return ref.releaseObject();
})
- .def("_get_live_operation_count", &PyMlirContext::getLiveOperationCount)
- .def("_get_live_operation_objects",
- &PyMlirContext::getLiveOperationObjects)
- .def("_clear_live_operations", &PyMlirContext::clearLiveOperations)
- .def("_clear_live_operations_inside",
- nb::overload_cast<MlirOperation>(
- &PyMlirContext::clearOperationsInside))
- .def("_get_live_module_count", &PyMlirContext::getLiveModuleCount)
.def_prop_ro(MLIR_PYTHON_CAPI_PTR_ATTR, &PyMlirContext::getCapsule)
.def(MLIR_PYTHON_CAPI_FACTORY_ATTR, &PyMlirContext::createFromCapsule)
.def("__enter__", &PyMlirContext::contextEnter)
@@ -3349,6 +3217,7 @@ void mlir::python::populateIRCore(nb::module_ &m) {
nb::class_<PyModule>(m, "Module", nb::is_weak_referenceable())
.def_prop_ro(MLIR_PYTHON_CAPI_PTR_ATTR, &PyModule::getCapsule)
.def(MLIR_PYTHON_CAPI_FACTORY_ATTR, &PyModule::createFromCapsule)
+ .def("_clear_referrent", &PyModule::clearReferrent)
.def_static(
"parse",
[](const std::string &moduleAsm, DefaultingPyMlirContext context) {
@@ -3428,7 +3297,13 @@ void mlir::python::populateIRCore(nb::module_ &m) {
// Defer to the operation's __str__.
return self.attr("operation").attr("__str__")();
},
- kOperationStrDunderDocstring);
+ kOperationStrDunderDocstring)
+ .def(
+ "__eq__",
+ [](PyModule &self, PyModule &other) {
+ return mlirModuleEqual(self.get(), other.get());
+ },
+ "other"_a);
//----------------------------------------------------------------------------
// Mapping of Operation.
@@ -3440,7 +3315,8 @@ void mlir::python::populateIRCore(nb::module_ &m) {
})
.def("__eq__",
[](PyOperationBase &self, PyOperationBase &other) {
- return &self.getOperation() == &other.getOperation();
+ return mlirOperationEqual(self.getOperation().get(),
+ other.getOperation().get());
})
.def("__eq__",
[](PyOperationBase &self, nb::object other) { return false; })
diff --git a/mlir/lib/Bindings/Python/IRModule.h b/mlir/lib/Bindings/Python/IRModule.h
index fa16ae3ce3294..e1a71c627a902 100644
--- a/mlir/lib/Bindings/Python/IRModule.h
+++ b/mlir/lib/Bindings/Python/IRModule.h
@@ -218,40 +218,6 @@ class PyMlirContext {
/// Gets the count of live context objects. Used for testing.
static size_t getLiveCount();
- /// Get a list of Python objects which are still in the live context map.
- std::vector<PyOperation *> getLiveOperationObjects();
-
- /// Gets the count of live operations associated with this context.
- /// Used for testing.
- size_t getLiveOperationCount();
-
- /// Clears the live operations map, returning the number of entries which were
- /// invalidated. To be used as a safety mechanism so that API end-users can't
- /// corrupt by holding references they shouldn't have accessed in the first
- /// place.
- size_t clearLiveOperations();
-
- /// Removes an operation from the live operations map and sets it invalid.
- /// This is useful for when some non-bindings code destroys the operation and
- /// the bindings need to made aware. For example, in the case when pass
- /// manager is run.
- ///
- /// Note that this does *NOT* clear the nested operations.
- void clearOperation(MlirOperation op);
-
- /// Clears all operations nested inside the given op using
- /// `clearOperation(MlirOperation)`.
- void clearOperationsInside(PyOperationBase &op);
- void clearOperationsInside(MlirOperation op);
-
- /// Clears the operaiton _and_ all operations inside using
- /// `clearOperation(MlirOperation)`.
- void clearOperationAndInside(PyOperationBase &op);
-
- /// Gets the count of live modules associated with this context.
- /// Used for testing.
- size_t getLiveModuleCount();
-
/// Enter and exit the context manager.
static nanobind::object contextEnter(nanobind::object context);
void contextExit(const nanobind::object &excType,
@@ -278,25 +244,6 @@ class PyMlirContext {
static nanobind::ft_mutex live_contexts_mutex;
static LiveContextMap &getLiveContexts();
- // Interns all live modules associated with this context. Modules tracked
- // in this map are valid. When a module is invalidated, it is removed
- // from this map, and while it still exists as an instance, any
- // attempt to access it will raise an error.
- using LiveModuleMap =
- llvm::DenseMap<const void *, std::pair<nanobind::handle, PyModule *>>;
- LiveModuleMap liveModules;
-
- // Interns all live operations associated with this context. Operations
- // tracked in this map are valid. When an operation is invalidated, it is
- // removed from this map, and while it still exists as an instance, any
- // attempt to access it will raise an error.
- using LiveOperationMap =
- llvm::DenseMap<void *, std::pair<nanobind::handle, PyOperation *>>;
- nanobind::ft_mutex liveOperationsMutex;
-
- // Guarded by liveOperationsMutex in free-threading mode.
- LiveOperationMap liveOperations;
-
bool emitErrorDiagnostics = false;
MlirContext context;
@@ -575,6 +522,8 @@ class PyModule : public BaseContextObject {
/// is taken by calling this function.
static nanobind::object createFromCapsule(nanobind::object capsule);
+ void clearReferrent() { module = {nullptr}; }
+
private:
PyModule(PyMlirContextRef contextRef, MlirModule module);
MlirModule module;
diff --git a/mlir/lib/Bindings/Python/Pass.cpp b/mlir/lib/Bindings/Python/Pass.cpp
index 20017e25b69bb..817479ee2421b 100644
--- a/mlir/lib/Bindings/Python/Pass.cpp
+++ b/mlir/lib/Bindings/Python/Pass.cpp
@@ -159,11 +159,7 @@ void mlir::python::populatePassManagerSubmodule(nb::module_ &m) {
"ValueError if the pipeline can't be parsed.")
.def(
"run",
- [](PyPassManager &passManager, PyOperationBase &op,
- bool invalidateOps) {
- if (invalidateOps) {
- op.getOperation().getContext()->clearOperationsInside(op);
- }
+ [](PyPassManager &passManager, PyOperationBase &op) {
// Actually run the pass manager.
PyMlirContext::ErrorCapture errors(op.getOperation().getContext());
MlirLogicalResult status = mlirPassManagerRunOnOp(
@@ -172,7 +168,7 @@ void mlir::python::populatePassManagerSubmodule(nb::module_ &m) {
throw MLIRError("Failure while executing pass pipeline",
errors.take());
},
- "operation"_a, "invalidate_ops"_a = true,
+ "operation"_a,
"Run the pass manager on the provided operation, raising an "
"MLIRError on failure.")
.def(
diff --git a/mlir/lib/Bindings/Python/TransformInterpreter.cpp b/mlir/lib/Bindings/Python/TransformInterpreter.cpp
index f9b0fed62778f..920bca886f617 100644
--- a/mlir/lib/Bindings/Python/TransformInterpreter.cpp
+++ b/mlir/lib/Bindings/Python/TransformInterpreter.cpp
@@ -67,7 +67,6 @@ static void populateTransformInterpreterSubmodule(nb::module_ &m) {
// root. This is awkward, but we don't have access to PyMlirContext
// object here otherwise.
nb::object obj = nb::cast(payloadRoot);
- obj.attr("context").attr("_clear_live_operations_inside")(payloadRoot);
MlirLogicalResult result = mlirTransformApplyNamedSequence(
payloadRoot, transformRoot, transformModule, options.options);
diff --git a/mlir/lib/CAPI/IR/IR.cpp b/mlir/lib/CAPI/IR/IR.cpp
index 8491553dab76f..7ea799c836889 100644
--- a/mlir/lib/CAPI/IR/IR.cpp
+++ b/mlir/lib/CAPI/IR/IR.cpp
@@ -465,6 +465,10 @@ MlirModule mlirModuleFromOperation(MlirOperation op) {
return wrap(dyn_cast<ModuleOp>(unwrap(op)));
}
+bool mlirModuleEqual(MlirModule mod, MlirModule other) {
+ return unwrap(mod) == unwrap(other);
+}
+
//===----------------------------------------------------------------------===//
// Operation state API.
//===----------------------------------------------------------------------===//
diff --git a/mlir/test/python/ir/module.py b/mlir/test/python/ir/module.py
index 6065e59fd6ed9..4a9cf9d44a8aa 100644
--- a/mlir/test/python/ir/module.py
+++ b/mlir/test/python/ir/module.py
@@ -121,27 +121,16 @@ def testRoundtripBinary():
def testModuleOperation():
ctx = Context()
module = Module.parse(r"""module @successfulParse {}""", ctx)
- assert ctx._get_live_module_count() == 1
op1 = module.operation
- assert ctx._get_live_operation_count() == 1
- live_ops = ctx._get_live_operation_objects()
- assert len(live_ops) == 1
- assert live_ops[0] is op1
- live_ops = None
# CHECK: module @successfulParse
print(op1)
# Ensure that operations are the same on multiple calls.
op2 = module.operation
- assert ctx._get_live_operation_count() == 1
- assert op1 is op2
+ assert op1 == op2
# Test live operation clearing.
op1 = module.operation
- assert ctx._get_live_operation_count() == 1
- num_invalidated = ctx._clear_live_operations()
- assert num_invalidated == 1
- assert ctx._get_live_operation_count() == 0
op1 = None
gc.collect()
op1 = module.operation
@@ -155,9 +144,6 @@ def testModuleOperation():
op1 = None
op2 = None
gc.collect()
- print("LIVE OPERATIONS:", ctx._get_live_operation_count())
- assert ctx._get_live_operation_count() == 0
- assert ctx._get_live_module_count() == 0
# CHECK-LABEL: TEST: testModuleCapsule
@@ -165,16 +151,14 @@ def testModuleOperation():
def testModuleCapsule():
ctx = Context()
module = Module.parse(r"""module @successfulParse {}""", ctx)
- assert ctx._get_live_module_count() == 1
# CHECK: "mlir.ir.Module._CAPIPtr"
module_capsule = module._CAPIPtr
print(module_capsule)
module_dup = Module._CAPICreate(module_capsule)
- assert module is module_dup
+ module._clear_referrent()
assert module_dup.context is ctx
# Gc and verify destructed.
module = None
module_capsule = None
module_dup = None
gc.collect()
- assert ctx._get_live_module_count() == 0
diff --git a/mlir/test/python/ir/operation.py b/mlir/test/python/ir/operation.py
index bf16e3f75d60d..c95a97b9731a2 100644
--- a/mlir/test/python/ir/operation.py
+++ b/mlir/test/python/ir/operation.py
@@ -907,7 +907,7 @@ def testCapsuleConversions():
m_capsule = m._CAPIPtr
assert '"mlir.ir.Operation._CAPIPtr"' in repr(m_capsule)
m2 = Operation._CAPICreate(m_capsule)
- assert m2 is m
+ assert m2 == m
# CHECK-LABEL: TEST: testOperationErase
diff --git a/mlir/test/python/ir/symbol_table.py b/mlir/test/python/ir/symbol_table.py
index 8b6d7ea5a197d..7afd539271d21 100644
--- a/mlir/test/python/ir/symbol_table.py
+++ b/mlir/test/python/ir/symbol_table.py
@@ -56,14 +56,6 @@ def testSymbolTableInsert():
print(m1)
assert "bar" not in symbol_table
- try:
- print(bar)
- except RuntimeError as e:
- if "the operation has been invalidated" not in str(e):
- raise
- else:
- assert False, "expected RuntimeError due to invalidated operation"
-
qux = m2.body.operations[0]
m1.body.append(qux)
symbol_table.insert(qux)
diff --git a/mlir/test/python/pass_manager.py b/mlir/test/python/pass_manager.py
index e26d42bb32913..0896cd9784641 100644
--- a/mlir/test/python/pass_manager.py
...
[truncated]
|
liveOpeartionsliveOperations
3de0c41 to
b32e0f3
Compare
liveOperationsliveOperations
6041b0c to
266540a
Compare
|
I kicked off a manual build on sanitizer-x86_64-linux-bootstrap-asan |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/12189 Here is the relevant piece of the build log for the reference |
|
I found that python |
Good catch. Yes let me do that right now. |
#155114 broke op hashing (because the python objects ceased to be reference equivalent). This PR fixes by binding `OperationEquivalence::computeHash`.
llvm/llvm-project#155114 broke op hashing (because the python objects ceased to be reference equivalent). This PR fixes by binding `OperationEquivalence::computeHash`.
llvm/llvm-project#155114 broke op hashing (because the python objects ceased to be reference equivalent). This PR fixes by binding `OperationEquivalence::computeHash`.
|
This broke me downstream! (PyCDE in the CIRCT repo.) I was calling So, I was using I was not using (as in dependent on) the caching behavior of the Is there any way for me to get this debugging feature back without a |
I added an API llvm-project/mlir/test/python/pass_manager.py Line 234 in b5daf76
So you can invalidate ops by hand. Now this requires actually being able to "find" them i.e., they should already have been instantiated on the Python side and be accessible/reachable by you at the point at which you return from the call to the pass manager. The first part was always a requirement (the |
Yes, this is the trick. I don't have a centralized, fool-proof way of tracking all of the operations which get created. Given my statement regarding debug-ability above, is there any interest in adding this to the MLIR Python bindings themselves? Or am I the only one who has scars from this issue? We'd just have to track the set of all |
The whole reason for removing I can suggest two workarounds:
You're the first to complain 😄 but even if you are the only one, I/we broke you and that's bad. So if these workarounds aren't sufficient then I can keep pondering this and if worse comes to worst we can restore |
|
Re using def _getr(slist, olist, seen):
for e in slist:
if id(e) in seen:
continue
seen[id(e)] = None
olist.append(e)
tl = gc.get_referents(e)
if tl:
_getr(tl, olist, seen)
def get_all_objects():
gcl = gc.get_objects()
olist = []
seen = {}
seen[id(gcl)] = None
seen[id(olist)] = None
seen[id(seen)] = None
_getr(gcl, olist, seen)
return olist
ops = [op1, op2, op3]
for obj in get_all_objects():
if isinstance(obj, (Operation, OpView)):
print(obj)EDIT: apparently there's a blessed way to do this https://docs.python.org/3/library/sys.html#sys.getobjects but it's behind a compile flag for Python itself. More info about this problem here (linked in the SO). |
|
Going through all of the objects in Python just to find the
I don't think closing these as "fixed" by claiming this behavior is correct by the new rules is accurate. More of a "wontfix". It's essentially defining the problem away. IMO any Python API call which crashes the interpreter is a legitimate bug and this doesn't address them at all. My opinion is that we need to be conservative by default (endeavor to never crash at the cost of occasionally disallowing potentially valid things) and let users opt-in to behavior which may cause crashes. For instance, we should invalidate all From previous discussion on this issue, @ftynse wrote in #93339:
I proposed exactly this a few years ago. (Can't remember exactly when, but I wrote about the feedback here.) Something like this would probably help not just with Python but all managed-memory-language bindings. I continue to think this is a good idea and a mechanism like this is necessary to make pointers "safe" for other languages. Maybe it's time to re-broach this idea? |
I mean only one of these is like that (#63916) - the rest were race conditions/bugs/etc. that were patched/re-patched/re-considered over and over.
This is GC vs no-GC argument in different clothing. The problem is we don't have a choice here - C++ has already made the choice for us right? In C++ code users (us devs or whomever) can hold invalidated references and there's no attempt made whatsoever to protect them from it (cf the dreaded
The bindings aren't designed to be used the way you're using them though - they weren't designed for people to be able to run passes and then still be in a valid state. What's been drilled into my head over the years is they were conceived of and designed only to support emitting IR (i.e., a nicer test writing UX). So we're already where you want to be - just assume they are all already invalid always and don't try to do what you're trying to do 😄. Now certainly the bindings have evolved (e.g., thanks to you, thanks to many others) and they do support some light "industrial use" (rewrites and such) but the founding premise is still there - they're not meant to be safe for this kind of use. And just re-stating the above: the user in this instance isn't a DSL user - it's you the compiler developer and you're writing library code (not user code). So I think the expectations are the same as for C++; i.e., the Python library code you write needs to be as conservative as it would need to be if it were just C++ library code.
I actually did sketch that out @ #97340. If you take a look you'll notice how much starts to change when you try to implement this - specifically everywhere you return an |
|
I found a third workaround: we can enlist the aid of nanobind, which itself of course keeps precise records (no pun intended) on which nanobinded types have been instantiated, and query for currently "live operations". Demo here #160632. Note, not for the faint of heart (uses |
That looks like what @stellaraccident was talking about rather than the proposal I had. My proposal is nowhere near as invasive. Simply a callback which one could register in MLIRContext (and plumbed through the CAPI) which would be called when an operation is destructed (pointer is no longer valid). We could use it in the Python bindings to invalidate the
I think you're missing my point. It doesn't matter how the bindings are supposed to be used. (I've pivoted to using them "the right way".) When they're used improperly, the correct behavior is to throw an exception rather than just crash. This is the expectation for all Python libraries. ... I wonder if it's possible to "catch" a crash in C++ land and translate it to a Python exception. That would also fix this problem and many others. Yeah, the C++ side is possibly not valid anymore but what's the worst which can happen if you call into it? Another crash which would be caught? |
|
In the first part where you plussed me in, I almost stayed lurking, but you nerd sniped me with the last paragraph :) I don't think you can catch this kind of crash. Maybe if you were just dealing with segmentation faults due to null pointers (this is how things like Java work: translating such segfaults into NPE). But what you are actually dealing with is stale pointers into random parts of the heap. These may even be valid allocations, because the way the heap gets used in mlir is pretty deterministic in a lot of cases. So you'd just be randomly stomping memory, sometimes doing dirty reads, sometimes doing stray writes and sometimes deallocating memory owned by something else. No catching any of that -- you just need to not do such things. Most of the time, it's time to acknowledge a design flaw when you start going down the path of implementing some kind of pointer tracking GC :) |
Yeah, I realized that like 15 minutes after posting that thought.
I recognize that. As a thought experiment, what would be a better design to make the MLIR Python bindings "safe"/"robust" to this sort of behavior? I can't think of one though I'm not that creative. |
Historical context:
PyMlirContext::liveOperationswas an optimization meant to cut down on the number of Python object allocations and (partially) a mechanism for updating validity of ops after transformation. E.g. during walking/transforming the AST. See original patch here.Inspired by a renewed interest in #139721 (which has become a little stale...)
In the previous go-around (#92631) there were two issues which have been resolved
ops that were "fetched" under a root op which has been transformed are no longer reported as invalid. We simply "formally forbid" this;
Module._CAPICreate(module_capsule)must now be followed by amodule._clear_mlir_module()to prevent double-freeing of the actualModuleOpobject (i.e. calling the dtor on theOwningOpRef<ModuleOp>):Module._CAPICreateAPI altogether and replace it with something likeModule._move(module)which will do bothModule._CAPICreateandmodule._clear_mlir_module.Note, the other approach I explored last year was a weakref system for
mlir::Operationwhich would effectively hoist thisliveOperationsthing into MLIR core. Possibly doable but I now believe it's a bad idea.The other potentially breaking change is
is, which checks object equality rather than value equality, will now reportFalsebecause we are always allocatingnewPython objects (ie that's the whole point of this change). Users wanting to check equality forOperationandModuleshould use==.cc @0xMihir @hawkinsp @superbobry @christopherbate @chhzh123
EDIT: Sanity check: no one is using Module._CAPICreate