Bug#37140331 MySQL NDB Cluster is crushing with Signal 8 error (Floating Point Exception) 2/2

vinc13e · vinc13e · commit 114529322a08 · 2024-12-10T09:26:35.000Z
Problem:
When a scan on ndbinfo table FRAG_MEM_USE or FRAG_OPERATIONS
is performed, a DBINFO_SCANREQ signal is sent to LQH. during
the handling of that signal, LHQ asks TUP about fragStats
of all tables defined in the cluster.
If in 'parallel' with the ndbinfo scan a create (or drop)
table operation is on going, there could be a discrepancy
between the view LHQ has of new table fragments and TUP view
about those fragments, that can lead to a crash in both TUP/ACC
or LQH.

In particular, if DBINFO_SCANREQ finds the new table with status
ADD_TABLE_ONGOING it could be impossible to TUP/ACC to get the
status of the new table fragments since at that point, fragments
information in TUP/ACC is not yet updated.
In similar way, during drop table if the status of the target table
in LQH is DROP_TABLE_* or PREP_DROP_* there could be differences
between the view that LQH and TUP/ACC have of the fragments of
that table.

Solution:
During scan of ndbinfo FRAG_MEM_USE or FRAG_OPERATIONS table, ignore
all fragments from tables that could be in a transient state at
that moment -- tables being created or dropped.

There are more ndbinfo tables related to fragments that,
theoretically, could have the same issue as the 2 fixed in
this patch but none of those have a gap between the handling
of the add (or release) table and the handling of its fragments
in same or in a different block.

Change-Id: I9370b21150d429bbc338caa2ced4ba17042c3998
diff --git a/storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp b/storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp
@@ -32288,7 +32288,8 @@ void Dblqh::execDBINFO_SCANREQ(Signal *signal) {
         TablerecPtr tabPtr;
         tabPtr.i = tableid;
         ptrAss(tabPtr, tablerec);
-        if (tabPtr.p->tableStatus != Tablerec::NOT_DEFINED) {
+        if (tabPtr.p->tableStatus == Tablerec::TABLE_DEFINED ||
+            tabPtr.p->tableStatus == Tablerec::TABLE_READ_ONLY) {
           jam();
           // Loop over all fragments for this table.
           for (Uint32 f = 0; f < NDB_ARRAY_SIZE(tabPtr.p->fragrec); f++) {
@@ -32373,7 +32374,8 @@ void Dblqh::execDBINFO_SCANREQ(Signal *signal) {
         TablerecPtr tabPtr;
         tabPtr.i = tableid;
         ptrAss(tabPtr, tablerec);
-        if (tabPtr.p->tableStatus != Tablerec::NOT_DEFINED) {
+        if (tabPtr.p->tableStatus == Tablerec::TABLE_DEFINED ||
+            tabPtr.p->tableStatus == Tablerec::TABLE_READ_ONLY) {
           jam();
           // Loop over the fragments of this table.
           for (Uint32 fragNo = 0; fragNo < NDB_ARRAY_SIZE(tabPtr.p->fragrec);
@@ -32382,7 +32384,6 @@ void Dblqh::execDBINFO_SCANREQ(Signal *signal) {
             if ((myFragPtr.i = tabPtr.p->fragrec[fragNo]) != RNIL) {
               jam();
               c_fragment_pool.getPtr(myFragPtr);
-
               /* Get fragment's stats from TUP */
               const Dbtup::FragStats fs =
                   c_tup->get_frag_stats(myFragPtr.p->tupFragptr);