Merge #151811 #151850 #152855

craig[bot] · bghal · williamchoe3 · craig[bot] · commit 28ae229b3187 · 2025-09-04T15:43:57.000Z
151811: rfcs: tiniest spelling fix r=bghal a=bghal TSIA Epic: none Release note: None 151850: roachtest: extract Fatal-level log messages to facilitate triage r=srosenberg,rickystewart,herkolategan a=williamchoe3 Fixes: #147360 ### Motivation Currently, when triaging an issue that originates from a Monitor watching a node you get a message that will most likely require you to download the CI logs and find and unzip the artifact. As mentioned in the linked issue, a simple grep on the node's logs can help to identify the issue quickly and there are cases where the roachtest failure can be categorized as an infra related flake (e.g. clock sync). Also this enhanced logging can potentially help older issues when their artifacts get wiped after the retention period expires. ### Changes For every failure, after artifact collection, we will call a new function `inspectArtifacts()` which will run a grep on the node logs to look for fatal level logs. If found, we save those logs and append them to the `message` string we pass to the `GithubPoster` interface which eventually passes the message to `issues.Body` In `issues.Body`, we call a new `TemplateData.CondensedMessage` message formatter method `FatalNodeRoachtest` which is similar to the existing `FatalOrPanic` & `RSGCrash` in order to better format the github issue message (see below for an example). * Note: I attempted to use the existing `CondensedMessage.FatalOrPanic`, but since we're only passing in a subset of the logs and because that method seems to expect a "go test like" message string, I opted to create a new method with it's own regex pattern to match this new message ### Verification Added 2 new manual roachtests to cover the `registry.TestSpec.Monitor = True` case, and another roachtest to cover when we're not setting the test level node monitor and using a test case defined monitor on a specific node. Used an internal SQL statement `SELECT crdb_internal.force_log_fatal('oops');` to mock fatal node behavior * https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sem/builtins/builtins.go#L6061 * https://docs.google.com/presentation/d/153LwR070a-BW1LGTv3SFLyB96aEVQQUvyKKWmzyO8jw/edit?slide=id.p#slide=id.p Manually verified local single node cluster, local multi node cluster, remote single node cluster, remote multi node cluster. For github markdown rendering, added a data driven test into `pkg/cmd/roachtest/github_test.go`. Decided not to add a case to `pkg/cmd/bazci/githubpost/issues/issues_test.go` because it'd be the same test case so I thought it'd be redundant, but i did add a new formatter to `pkg/cmd/bazci/githubpost/issues/formatter_unit.go` so I can see the argument for also including the test case in the `issues` packages along with the test case in `roachtest` ### Misc / Design decisions Current grep is limited to up to 10 lines. I choose that arbitrarily, open to changing it. Technically, I don't think I needed to use concurrency control for `githubMessage` because I'm only writing to it during test teardown / cleanup, but I did it incase we ever append to that string when we're not serial Initially wanted to run grep on each node via `Cluster.RunE()` and then return those results back to the test runner, but because by the time we are in the monitor defer block, the cancel context signal has already been sent so `Cluster.RunE()` is unable to run. Originally I was wrapping errors thrown by the monitor with a new Monitor specific error type, but after [this thread discussion](#151850 (comment)), in order to capture unmonitored node fatals / panics, we decided to call `inspectArtifacts` on every failure, not just monitor specific failure. This adds an additional grep command to every failure, but it should only be a few seconds and the tradeoff for better logging was prioritized. ### E.g. Github Issue with Fatal Logs #152540 <img width="1347" height="690" alt="image" src="https://github.com/user-attachments/assets/f28365b1-5c04-469f-aa8a-abf2085a5474" /> 152855: stmtdiagnostics: Add support for transaction diagnostics r=kyle-a-wong a=kyle-a-wong Adds a new TxnRegistry and other supporting structs to support the collection of transaction diagnostic bundles. The TxnRegistry adds functionality to: - Register a TxnRequest - defines the criteria for collecting a transaction diagnostic bundle - Start collecting a transaction bundle - This is done by checking that a statement fingerprint id matches the first statement fingerprint id in a TxnRequest - Save a transaction diagnostic bundle upon completion to be downloaded in the future Since the system tables to persist transaction diagnostics and transaction diagnostics requests don't exist yet, this commit only registers requests in the local registry. A future commit will add request and diagnostic persistence, as well as add polling logic to register requests created in other gateway nodes. Part of: [CRDB-5342](https://cockroachlabs.atlassian.net/browse/CRDB-5342) Epic: [CRDB-53541](https://cockroachlabs.atlassian.net/browse/CRDB-53541) Release note: None Co-authored-by: Brendan Gerrity <brendan.gerrity@cockroachlabs.com> Co-authored-by: William Choe <williamchoe3@gmail.com> Co-authored-by: Kyle Wong <37189875+kyle-a-wong@users.noreply.github.com>
diff --git a/docs/tech-notes/version_upgrades.md b/docs/tech-notes/version_upgrades.md
@@ -141,7 +141,7 @@ version") is also handled via the upgrade process, as follows:
 ## Enforcing the invariants during upgrades in single-tenancy
 
 In the single-tenant case, the upgrade process (in
-`pkg/upgrace/upgrademanager/manager.go`, `Migrate()`) enforces the
+`pkg/upgrade/upgrademanager/manager.go`, `Migrate()`) enforces the
 invariants as follows:
 
 1. the current cluster version X is observed.
diff --git a/pkg/cmd/bazci/githubpost/issues/condense.go b/pkg/cmd/bazci/githubpost/issues/condense.go
@@ -49,6 +49,12 @@ type RSGCrash struct {
 	Schema string // the schema that the crash was induced with
 }
 
+// FatalNodeRoachtest contains a fatal error from a node from a roachtest
+type FatalNodeRoachtest struct {
+	Message,
+	FatalLogs string
+}
+
 // A CondensedMessage is a test log output garnished with useful helper methods
 // that extract concise information for seamless debugging.
 type CondensedMessage string
@@ -61,6 +67,8 @@ var fatalRE = regexp.MustCompile(`(?ms)(^F\d{6}.*?\n)(goroutine \d+.*?\n)\n`)
 var crasherRE = regexp.MustCompile(`(?s)( *rsg_test.go:\d{3}: Crash detected:.*?\n)(.*?;\n)`)
 var reproRE = regexp.MustCompile(`(?s)( *rsg_test.go:\d{3}: To reproduce, use schema:)`)
 
+var roachtestNodeFatalRE = regexp.MustCompile(`(?ms)\A(.*?\n)((?:^F\d{6}\b[^\n]*(?:\n|$))+)`)
+
 // FatalOrPanic constructs a FatalOrPanic. If no fatal or panic occurred in the
 // test, ok=false is returned.
 func (s CondensedMessage) FatalOrPanic(numPrecedingLines int) (fop FatalOrPanic, ok bool) {
@@ -98,6 +106,23 @@ func (s CondensedMessage) RSGCrash(lineLimit int) (c RSGCrash, ok bool) {
 	return RSGCrash{}, false
 }
 
+// FatalNodeRoachtest constructs a FatalNodeRoachtest which is used to
+// construct an issue with node fatal logs in a Roachtest. If not found, or if
+// regex matching doesn't return the exact expected number of matches,
+// ok=false is returned
+func (s CondensedMessage) FatalNodeRoachtest() (fnr FatalNodeRoachtest, ok bool) {
+	ss := string(s)
+	if matches := roachtestNodeFatalRE.FindStringSubmatchIndex(ss); matches != nil {
+		if len(matches) != 6 {
+			return FatalNodeRoachtest{}, false
+		}
+		fnr.Message = ss[matches[2] : matches[3]-1]
+		fnr.FatalLogs = ss[matches[4]:matches[5]]
+		return fnr, true
+	}
+	return FatalNodeRoachtest{}, false
+}
+
 // String calls .Digest(30).
 func (s CondensedMessage) String() string {
 	return s.Digest(30)
diff --git a/pkg/cmd/bazci/githubpost/issues/formatter_unit.go b/pkg/cmd/bazci/githubpost/issues/formatter_unit.go
@@ -69,6 +69,11 @@ func (unitTestFormatterTyp) Body(r *Renderer, data TemplateData) error {
 			r.Escaped("Schema:")
 			r.CodeBlock("", rsgCrash.Schema)
 		}
+	} else if fnr, ok := data.CondensedMessage.FatalNodeRoachtest(); ok {
+		r.Escaped("Failed with:")
+		r.CodeBlock("", fnr.Message)
+		r.Escaped("Fatal entries found in Cockroach logs:")
+		r.CodeBlock("", fnr.FatalLogs)
 	} else {
 		r.CodeBlock("", data.CondensedMessage.Digest(50))
 	}
diff --git a/pkg/cmd/roachtest/github_test.go b/pkg/cmd/roachtest/github_test.go
@@ -225,6 +225,11 @@ func TestCreatePostRequest(t *testing.T) {
 						case "lose-error-object":
 							// Lose the error object which should make our flake detection fail.
 							refError = errors.Newf("%s", redact.SafeString(refError.Error()))
+						case "node-fatal":
+							refError = errors.Newf(`(monitor.go:267).Wait: monitor failure: dial tcp 127.0.0.1:29000: connect: connection refused
+test artifacts and logs in: artifacts/roachtest/manual/monitor/test-failure/node-fatal-explicit-monitor/cpu_arch=arm64/run_1
+F250826 19:49:07.194443 3106 sql/sem/builtins/builtins.go:6063 ⋮ [T1,Vsystem,n1,client=127.0.0.1:54552,hostssl,user=‹roachprod›] 250  force_log_fatal(): ‹oops›
+`)
 						}
 					}
 				}
diff --git a/pkg/cmd/roachtest/test_impl.go b/pkg/cmd/roachtest/test_impl.go
@@ -140,6 +140,10 @@ type testImpl struct {
 		// parameters if there is a failure. They will additionally be logged in the test itself
 		// in case github issue posting is disabled.
 		extraParams map[string]string
+
+		// githubMessage contains additional message information that will be
+		// passed to github.MaybePost
+		githubMessage string
 	}
 	// Map from version to path to the cockroach binary to be used when
 	// mixed-version test wants a binary for that binary. If a particular version
@@ -558,6 +562,18 @@ func (t *testImpl) failureMsg() string {
 	return b.String()
 }
 
+func (t *testImpl) getGithubMessage() string {
+	t.mu.RLock()
+	defer t.mu.RUnlock()
+	return t.mu.githubMessage
+}
+
+func (t *testImpl) appendGithubMessage(msg string) {
+	t.mu.Lock()
+	defer t.mu.Unlock()
+	t.mu.githubMessage += msg
+}
+
 // failuresMatchingError checks whether the first error in trees of
 // any of the errors in the failures passed match the `refError`
 // target. If it does, `refError` is set to that target error value
diff --git a/pkg/cmd/roachtest/test_runner.go b/pkg/cmd/roachtest/test_runner.go
@@ -17,6 +17,7 @@ import (
 	"net"
 	"net/http"
 	"os"
+	"os/exec"
 	"path/filepath"
 	"runtime"
 	"sort"
@@ -1268,8 +1269,12 @@ func (r *testRunner) runTest(
 
 				output := fmt.Sprintf("%s\ntest artifacts and logs in: %s", failureMsg, t.ArtifactsDir())
 				params := getTestParameters(t, issueInfo.cluster, issueInfo.vmCreateOpts)
+				githubMsg := output
+				if testGithubMsg := t.getGithubMessage(); testGithubMsg != "" {
+					githubMsg = fmt.Sprintf("%s\n%s", output, testGithubMsg)
+				}
 				logTestParameters(l, params)
-				issue, err := github.MaybePost(t, issueInfo, l, output, params)
+				issue, err := github.MaybePost(t, issueInfo, l, githubMsg, params)
 				if err != nil {
 					shout(ctx, l, stdout, "failed to post issue: %s", err)
 					atomic.AddInt32(&r.numGithubPostErrs, 1)
@@ -1490,10 +1495,18 @@ func (r *testRunner) runTest(
 	// From now on, all logging goes to test-teardown.log to give a clear separation between
 	// operations originating from the test vs the harness. The only error that can originate here
 	// is from artifact collection, which is best effort and for which we do not fail the test.
+	// TODO(wchoe): improve log destination consistency, above comment doesn't take deferred calls into account
+	// testRunner.runTest's deferred calls write to the original test.log, not test-teardown.log
+	// and the deferred calls aren't necessarily related to test teardown so the
+	// correct log to write to is ambiguous
 	replaceLogger("test-teardown")
 	if err := r.teardownTest(ctx, t, c, timedOut); err != nil {
 		l.PrintfCtx(ctx, "error during test teardown: %v; see test-teardown.log for details", err)
 	}
+	if err := r.inspectArtifacts(ctx, t, l); err != nil {
+		// inspect artifacts and potentially add helpful triage information for failed tests
+		l.PrintfCtx(ctx, "error during artifact inspection: %v", err)
+	}
 }
 
 // getVMNames returns a comma separated list of VM names.
@@ -1723,6 +1736,76 @@ func (r *testRunner) teardownTest(
 	return nil
 }
 
+// inspectArtifacts inspects node logs and attempts to write helpful triage
+// information to the test log and testRunner.githubMessage
+// This method is best effort and should not fail a test.
+// This method writes to both testLogger which is expected to be test.log and
+// t.L() which is test-teardown.log since inspectArtifacts is called after
+// teardownTest
+func (r *testRunner) inspectArtifacts(
+	ctx context.Context, t *testImpl, testLogger *logger.Logger,
+) error {
+
+	if t.Failed() || roachtestflags.AlwaysCollectArtifacts {
+		t.L().Printf("Attempting to gather node fatal level logs for triage.")
+		out, err := gatherFatalNodeLogs(t, testLogger)
+		if err != nil {
+			return err
+		}
+		if out == "" {
+			t.L().Printf("No fatal level logs found.")
+			return nil
+		} else {
+			testLogger.PrintfCtx(ctx, "CockroachDB contains Fatal level logs. Up to the first 10 "+
+				"will be shown here. See node logs in artifacts for more details.\n%s", out)
+			t.appendGithubMessage(out)
+			return nil
+		}
+	}
+	return nil
+}
+
+// gatherFatalNodeLogs attempts to gather fatal level node logs to help with
+// triage
+func gatherFatalNodeLogs(t *testImpl, testLogger *logger.Logger) (string, error) {
+	logPattern := `^F[0-9]{6}`
+	filePattern := "logs/*unredacted/cockroach*.log"
+	// *unredacted captures patterns for single node and multi-node clusters
+	// e.g. unredacted, 1.unredacted
+	joinedFilePath := filepath.Join(t.ArtifactsDir(), filePattern)
+	targetFiles, err := filepath.Glob(joinedFilePath)
+	if err != nil {
+		return "", err
+	} else if len(targetFiles) == 0 {
+		return "", errors.Newf("No matching log files found for log pattern: %s and file pattern: %s",
+			logPattern, filePattern)
+	}
+	args := append([]string{"-E", "-m", "10", "-a", logPattern}, targetFiles...)
+	command := "grep"
+	t.L().Printf("Gathering fatal level logs with command: %q %s", command, strings.Join(args, " "))
+	// Works with local and remote node clusters because we will always download
+	// the artifacts if there's a test failure (except for timeout)
+	cmd := exec.Command("grep", args...)
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		var ee *exec.ExitError
+		if errors.As(err, &ee) && ee.ExitCode() == 1 {
+			testLogger.Printf("No fatal level logs found.")
+			// Not finding files isn't necessarily an error so don't return an error
+			return "", nil
+		}
+		return "", err
+	}
+	// trim file path from output for readability
+	lines := strings.Split(string(out), "\n")
+	for i, line := range lines {
+		if idx := strings.IndexByte(line, ':'); idx >= 0 {
+			lines[i] = strings.TrimLeft(line[idx+1:], " \t")
+		}
+	}
+	return strings.Join(lines, "\n"), err
+}
+
 // maybeSaveClusterDueToInvariantProblems detects rare conditions (such as
 // storage durability crashes) on the cluster and if one is detected,
 // unconditionally preserves the cluster for future debugging. It also creates
diff --git a/pkg/cmd/roachtest/testdata/github/node_fatal b/pkg/cmd/roachtest/testdata/github/node_fatal
@@ -0,0 +1,65 @@
+# Test failure due to node fatal should include fatal node logs in github issue
+
+add-failure name=(oops) type=(node-fatal)
+----
+ok
+
+post
+----
+----
+roachtest.github_test [failed]() on test_branch @ [test_SHA]():
+
+Failed with:
+
+```
+(monitor.go:267).Wait: monitor failure: dial tcp 127.0.0.1:29000: connect: connection refused
+test artifacts and logs in: artifacts/roachtest/manual/monitor/test-failure/node-fatal-explicit-monitor/cpu_arch=arm64/run_1
+```
+Fatal entries found in Cockroach logs:
+
+```
+F250826 19:49:07.194443 3106 sql/sem/builtins/builtins.go:6063 ⋮ [T1,Vsystem,n1,client=127.0.0.1:54552,hostssl,user=?roachprod?] 250  force_log_fatal(): ?oops?
+```
+
+Parameters:
+ - <code>arch=amd64</code>
+ - <code>cloud=gce</code>
+ - <code>coverageBuild=false</code>
+ - <code>cpu=4</code>
+ - <code>encrypted=false</code>
+ - <code>fs=ext4</code>
+ - <code>localSSD=true</code>
+ - <code>runtimeAssertionsBuild=false</code>
+ - <code>ssd=0</code>
+<details><summary>Help</summary>
+<p>
+
+
+See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)
+
+
+
+See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
+
+
+
+See: [Grafana](https://go.crdb.dev/roachtest-grafana//github-test/1689957243000/1689957853000)
+
+</p>
+</details>
+/cc @cockroachdb/unowned
+<sub>
+
+[This test on roachdash](https://roachdash.crdb.dev/?filter=status:open%20t:.*github_test.*&sort=title+created&display=lastcommented+project) | [Improve this report!](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/bazci/githubpost/issues)
+
+</sub>
+
+------
+Labels:
+- <code>O-roachtest</code>
+- <code>C-test-failure</code>
+- <code>release-blocker</code>
+Rendered:
+https://github.com/cockroachdb/cockroach/issues/new?body=roachtest.github_test+%5Bfailed%5D%28%29+on+test_branch+%40+%5Btest_SHA%5D%28%29%3A%0A%0AFailed+with%3A%0A%0A%60%60%60%0A%28monitor.go%3A267%29.Wait%3A+monitor+failure%3A+dial+tcp+127.0.0.1%3A29000%3A+connect%3A+connection+refused%0Atest+artifacts+and+logs+in%3A+artifacts%2Froachtest%2Fmanual%2Fmonitor%2Ftest-failure%2Fnode-fatal-explicit-monitor%2Fcpu_arch%3Darm64%2Frun_1%0A%60%60%60%0AFatal+entries+found+in+Cockroach+logs%3A%0A%0A%60%60%60%0AF250826+19%3A49%3A07.194443+3106+sql%2Fsem%2Fbuiltins%2Fbuiltins.go%3A6063+%E2%8B%AE+%5BT1%2CVsystem%2Cn1%2Cclient%3D127.0.0.1%3A54552%2Chostssl%2Cuser%3D%3Froachprod%3F%5D+250++force_log_fatal%28%29%3A+%3Foops%3F%0A%60%60%60%0A%0AParameters%3A%0A+-+%3Ccode%3Earch%3Damd64%3C%2Fcode%3E%0A+-+%3Ccode%3Ecloud%3Dgce%3C%2Fcode%3E%0A+-+%3Ccode%3EcoverageBuild%3Dfalse%3C%2Fcode%3E%0A+-+%3Ccode%3Ecpu%3D4%3C%2Fcode%3E%0A+-+%3Ccode%3Eencrypted%3Dfalse%3C%2Fcode%3E%0A+-+%3Ccode%3Efs%3Dext4%3C%2Fcode%3E%0A+-+%3Ccode%3ElocalSSD%3Dtrue%3C%2Fcode%3E%0A+-+%3Ccode%3EruntimeAssertionsBuild%3Dfalse%3C%2Fcode%3E%0A+-+%3Ccode%3Essd%3D0%3C%2Fcode%3E%0A%3Cdetails%3E%3Csummary%3EHelp%3C%2Fsummary%3E%0A%3Cp%3E%0A%0A%0ASee%3A+%5Broachtest+README%5D%28https%3A%2F%2Fgithub.com%2Fcockroachdb%2Fcockroach%2Fblob%2Fmaster%2Fpkg%2Fcmd%2Froachtest%2FREADME.md%29%0A%0A%0A%0ASee%3A+%5BHow+To+Investigate+%5C%28internal%5C%29%5D%28https%3A%2F%2Fcockroachlabs.atlassian.net%2Fl%2Fc%2FSSSBr8c7%29%0A%0A%0A%0ASee%3A+%5BGrafana%5D%28https%3A%2F%2Fgo.crdb.dev%2Froachtest-grafana%2F%2Fgithub-test%2F1689957243000%2F1689957853000%29%0A%0A%3C%2Fp%3E%0A%3C%2Fdetails%3E%0A%2Fcc+%40cockroachdb%2Funowned%0A%3Csub%3E%0A%0A%5BThis+test+on+roachdash%5D%28https%3A%2F%2Froachdash.crdb.dev%2F%3Ffilter%3Dstatus%3Aopen%2520t%3A.%2Agithub_test.%2A%26sort%3Dtitle%2Bcreated%26display%3Dlastcommented%2Bproject%29+%7C+%5BImprove+this+report%21%5D%28https%3A%2F%2Fgithub.com%2Fcockroachdb%2Fcockroach%2Ftree%2Fmaster%2Fpkg%2Fcmd%2Fbazci%2Fgithubpost%2Fissues%29%0A%0A%3C%2Fsub%3E%0A%0A------%0ALabels%3A%0A-+%3Ccode%3EO-roachtest%3C%2Fcode%3E%0A-+%3Ccode%3EC-test-failure%3C%2Fcode%3E%0A-+%3Ccode%3Erelease-blocker%3C%2Fcode%3E%0A&template=none&title=roachtest%3A+github_test+failed
+----
+----
diff --git a/pkg/cmd/roachtest/tests/roachtest.go b/pkg/cmd/roachtest/tests/roachtest.go
@@ -15,6 +15,7 @@ import (
 	"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/registry"
 	"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test"
 	"github.com/cockroachdb/cockroach/pkg/roachprod/install"
+	"github.com/stretchr/testify/require"
 )
 
 func registerRoachtest(r registry.Registry) {
@@ -53,4 +54,62 @@ func registerRoachtest(r registry.Registry) {
 		Timeout: 3 * time.Minute,
 		Cluster: r.MakeClusterSpec(3),
 	})
+
+	// Manual test for verifying framework behavior in a test failure scenario
+	// via unexpected node fatal error from an explicitly monitored node
+	r.Add(registry.TestSpec{
+		Name:             "roachtest/manual/monitor/test-failure/node-fatal-explicit-monitor",
+		Owner:            registry.OwnerTestEng,
+		Cluster:          r.MakeClusterSpec(1),
+		CompatibleClouds: registry.AllClouds,
+		Suites:           registry.ManualOnly,
+		Run: func(ctx context.Context, t test.Test, c cluster.Cluster) {
+			monitorFatalTest(ctx, t, c)
+		},
+	})
+
+	// Manual test for verifying framework behavior in a test failure scenario
+	// via unexpected node fatal error using the roachtest built-in node monitor
+	r.Add(registry.TestSpec{
+		Name:             "roachtest/manual/monitor/test-failure/node-fatal-global-monitor",
+		Owner:            registry.OwnerTestEng,
+		Cluster:          r.MakeClusterSpec(1),
+		CompatibleClouds: registry.AllClouds,
+		Suites:           registry.ManualOnly,
+		Monitor:          true,
+		Run: func(ctx context.Context, t test.Test, c cluster.Cluster) {
+			monitorFatalTestGlobal(ctx, t, c)
+		},
+	})
+}
+
+// monitorFatalTest will always fail with a node logging a fatal error in a
+// goroutine that is being watched by a monitor
+func monitorFatalTest(ctx context.Context, t test.Test, c cluster.Cluster) {
+	c.Start(ctx, t.L(), option.DefaultStartOpts(), install.MakeClusterSettings())
+	m := c.NewDeprecatedMonitor(ctx, c.Node(1))
+	n1 := c.Conn(ctx, t.L(), 1)
+	defer n1.Close()
+	require.NoError(t, n1.PingContext(ctx))
+
+	m.Go(func(ctx context.Context) (err error) {
+		_, err = n1.ExecContext(ctx, "SELECT crdb_internal.force_log_fatal('oops');")
+		return err
+	})
+	m.Wait()
+}
+
+// monitorFatalTestGlobal will always fail with a node logging a fatal error
+// not within an explicit goroutine. Expects registry.TestSpec.Monitor to be
+// set to True
+func monitorFatalTestGlobal(ctx context.Context, t test.Test, c cluster.Cluster) {
+	c.Start(ctx, t.L(), option.DefaultStartOpts(), install.MakeClusterSettings())
+	n1 := c.Conn(ctx, t.L(), 1)
+	defer n1.Close()
+	require.NoError(t, n1.PingContext(ctx))
+
+	_, err := n1.ExecContext(ctx, "SELECT crdb_internal.force_log_fatal('oops');")
+	if err != nil {
+		t.L().Printf("Error executing query: %s", err)
+	}
 }
diff --git a/pkg/sql/stmtdiagnostics/BUILD.bazel b/pkg/sql/stmtdiagnostics/BUILD.bazel
diff --git a/pkg/sql/stmtdiagnostics/statement_diagnostics.go b/pkg/sql/stmtdiagnostics/statement_diagnostics.go
diff --git a/pkg/sql/stmtdiagnostics/txn_diagnostics.go b/pkg/sql/stmtdiagnostics/txn_diagnostics.go
diff --git a/pkg/sql/stmtdiagnostics/txn_diagnostics_test.go b/pkg/sql/stmtdiagnostics/txn_diagnostics_test.go

Original file line number	Diff line number	Diff line change
`@@ -69,6 +69,11 @@ func (unitTestFormatterTyp) Body(r *Renderer, data TemplateData) error {`
`69`	`69`	`r.Escaped("Schema:")`
`70`	`70`	`r.CodeBlock("", rsgCrash.Schema)`
`71`	`71`	`}`
	`72`	`+ } else if fnr, ok := data.CondensedMessage.FatalNodeRoachtest(); ok {`
	`73`	`+ r.Escaped("Failed with:")`
	`74`	`+ r.CodeBlock("", fnr.Message)`
	`75`	`+ r.Escaped("Fatal entries found in Cockroach logs:")`
	`76`	`+ r.CodeBlock("", fnr.FatalLogs)`
`72`	`77`	`} else {`
`73`	`78`	`r.CodeBlock("", data.CondensedMessage.Digest(50))`
`74`	`79`	`}`
Original file line number	Diff line number	Diff line change
`@@ -225,6 +225,11 @@ func TestCreatePostRequest(t *testing.T) {`
`225`	`225`	`case "lose-error-object":`
`226`	`226`	`// Lose the error object which should make our flake detection fail.`
`227`	`227`	`refError = errors.Newf("%s", redact.SafeString(refError.Error()))`
	`228`	`+ case "node-fatal":`
	`229`	+ refError = errors.Newf(`(monitor.go:267).Wait: monitor failure: dial tcp 127.0.0.1:29000: connect: connection refused
	`230`	`+test artifacts and logs in: artifacts/roachtest/manual/monitor/test-failure/node-fatal-explicit-monitor/cpu_arch=arm64/run_1`
	`231`	`+F250826 19:49:07.194443 3106 sql/sem/builtins/builtins.go:6063 ⋮ [T1,Vsystem,n1,client=127.0.0.1:54552,hostssl,user=‹roachprod›] 250 force_log_fatal(): ‹oops›`
	`232`	+`)
`228`	`233`	`}`
`229`	`234`	`}`
`230`	`235`	`}`