Skip to content

Abort execution when platform telemetry error#6827

Open
jorgee wants to merge 1 commit intomasterfrom
nf-356-tower-abort-on-error
Open

Abort execution when platform telemetry error#6827
jorgee wants to merge 1 commit intomasterfrom
nf-356-tower-abort-on-error

Conversation

@jorgee
Copy link
Contributor

@jorgee jorgee commented Feb 12, 2026

This pull request introduces a new mechanism to control error handling behavior in the TowerClient class by adding an abortOnError flag, which can be set via the environment variable TOWER_ABORT_ON_ERROR. When enabled, critical errors encountered while communicating with Seqera Platform will cause the workflow to abort immediately using the AbortRunException. The changes also include improved error propagation and additional tests to verify this behavior.

Error Handling Improvements:

  • Added abortOnError flag to TowerClient, defaulting to true, and made it configurable via the TOWER_ABORT_ON_ERROR environment variable. This determines whether critical errors abort the workflow or are handled as warnings. [1] [2]
  • Updated error handling in TowerClient methods (logHttpResponse, parseTowerResponse, and others) to throw AbortRunException when abortOnError is enabled, ensuring immediate workflow termination on critical errors. [1] [2] [3] [4] [5]

Session and Exception Propagation:

  • Modified the Session class to specifically catch and log AbortRunException during observer notification, ensuring these exceptions propagate and abort the workflow as intended. [1] [2]

Tests:

  • Added new tests in TowerClientTest to verify the correct detection of the abortOnError setting and to ensure that the workflow aborts as expected when errors occur and abortOnError is enabled. [1] [2]

…lemetry errors

Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@netlify
Copy link

netlify bot commented Feb 12, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 6e53f3b
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/698dcf321f27dd00085d4d44

Comment on lines +312 to +314
if( abortOnError ) {
throw new AbortRunException("Invalid Seqera Platform API response - Missing workflow Id")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding this condition if an exception was already thrown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as the previous comment, notifyEvent was catching the exceptions but not re-throwing to produce an error in the execution.

Comment on lines +304 to +306
if( abortOnError ) {
throw new AbortRunException(resp.message)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not getting what this is adding?

Copy link
Contributor Author

@jorgee jorgee Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notifyEvent was catching all exceptions and printing a log message, but not re-throwing the exception. So, no error was produced with normal exceptions. I changed notifyEvent to catch and rethrow the AbortRunException. I didn't want to rethrow all exceptions or use the AbortOperationException because it is widely used and it could be very likely to introduce side effects. The AbortRunException was only used at ScriptRunner and Launcher. It was safe to use and matches with the meaning of the exception

I initially implemented it by invoking session.abort from here, but it was not working because the execution has not started yet. Abort was executed and threw the exception, but as it was caught at notifyEvent, the execution continued and produced failures later when trying to run the script in the aborted session. I think it could be fixed by inspecting the session just after invoking start, and throwing an exception if it is aborted.

Both are easy to implement; I just selected this one because it works in the same way for all the events, and it could be reused for other observers. If you think something is wrong or it's better to abort directly, I can try with the other approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants