Skip to content

Conversation

JordanYates
Copy link
Contributor

@JordanYates JordanYates commented Oct 5, 2025

Add a callback for the periodic script result so that applications have a way of detecting dead links.

The goal of this PR is to enable this pattern to recover after a dead modem link.

static void modem_event_cb(const struct device *dev, enum cellular_event evt, const void *payload,
			   void *user_data)
{
    if(evt == CELLULAR_EVENT_MODEM_COMMS_CHECK_RESULT) {
        const struct cellular_evt_modem_comms_check_result *mccr= payload;

        if (mccr->success == false) {
            /* Do something to shutdown the modem (Exercise for the reader).
             * Don't call `pm_device_runtime_put(dev) though, since that will block this
             * context for up to a minute.
             */
        }
    }
   if(evt == CELLULAR_EVENT_MODEM_SUSPENDED) {
       sys_reboot(SYS_REBOOT_WARM);
   }
}

The advantage of this over just rebooting immediately is that it gives the modem a chance to cleanly disconnect from the LTE network.

@zephyrbot zephyrbot added area: Modem area: Tests Issues related to a particular existing or missing test area: Modem Drivers Release Notes To be mentioned in the release notes labels Oct 5, 2025
rerickson1
rerickson1 previously approved these changes Oct 6, 2025
Comment on lines 95 to 96
/** Periodic script result */
CELLULAR_EVENT_PERIODIC_SCRIPT_RESULT = BIT(2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This event is internal to the modem_cellular.c driver, unlike CELLULAR_EVENT_REGISTRATION_STATUS_CHANGED which is generic to any modem driver. I believe it would be more portable to make this event generic to what it indicates, rather than what it "is". Something like CELLULAR_EVENT_MODEM_UNRESPONSIVE, or CELLULAR_EVENT_MODEM_DEAD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event is called on both success and failure so those names don't work, but I see your point. Would you be happy with CELLULAR_EVENT_MODEM_KEEPALIVE?

Comment on lines 81 to 89
config MODEM_CELLULAR_SUSPEND_ASYNC
bool "Suspend the modem asynchronously instead of blocking"
help
Suspending a cellular modem can be a long running process, and blocking
the context that triggers the suspension can be undesirable. Enabling
this option lets the `PM_DEVICE_ACTION_SUSPEND` handler return
immediately, with notification of the suspension provided through the
`CELLULAR_EVENT_MODEM_SUSPENDED` event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks a promise of pm_device_action_run(), specifically that the call is synchronous. It looks like this solution has an issue with power domains. If its on a power domain, power will be cut immediately after the call with PM_DEVICE_ACTION_SUSPEND returns. This may prevent the modem driver from reaching the idle state in the first place since the modem will not respond to anything, its not powered. Additional logic can be added to the modem_cellular_pm_action hook to react to the PM_DEVICE_ACTION_TURN_OFF which will be received right before the power domain is turned of, to just "hard reset" the driver and call back with the CELLULAR_EVENT_MODEM_SUSPENDED event at that point.

Could some other call to speed up the suspend process be added to the modem API for example? If the thing that takes a long time is unregistering from the network, maybe call net_if_down() and wait for PPP to be terminated, then call pm_device_runtime_put()? What are the operations that take so long we need to do it async?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks a promise of pm_device_action_run(), specifically that the call is synchronous.

Yes

It looks like this solution has an issue with power domains.

The driver already doesn't work properly with power domains (random IO pins would be held high), so I wasn't too concerned with this.

Additional logic can be added to the modem_cellular_pm_action hook to react to the PM_DEVICE_ACTION_TURN_OFF

I could add this if it would negate the concern in the first point

Could some other call to speed up the suspend process be added to the modem API for example?

That would break PM promises in different ways, you would have the device being suspended while still in the ACTIVE state.

What are the operations that take so long we need to do it async?

No idea in the general case (i.e. I have no idea what the Quectel AT+QPOWD=1 command would be doing or how long it would take), but all the drivers have a shutdown_ms parameter on the order of several seconds that the driver just waits for after pulsing the power pin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shutdown_ms is a fallback for modems which don't have a shutdown script, or to force the modem off if the shutdown script fails. "AT+QPOWD=1" is the commend sent to shut down a Quectel modem, without using the power or reset pins, once it returns OK, the modem is off and idle is entered, no shutdown_ms used.

Regarding "backpowering", that is not handled in the driver at this point. I think a good solution would be to implement it in the pm action handler on the TURN_OFF event like the example driver here https://docs.zephyrproject.org/latest/services/pm/device.html#device-model-with-device-power-management-support, I don't think we should use that as an argument to make the driver even less compatible with device PM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shutdown_ms is a fallback for modems which don't have a shutdown script, or to force the modem off if the shutdown script fails. "AT+QPOWD=1" is the commend sent to shut down a Quectel modem, without using the power or reset pins, once it returns OK, the modem is off and idle is entered, no shutdown_ms used.

That is incorrect, Quectel modules can take up to 60 seconds to properly disconnect from the network and must remain powered for that entire duration, regardless of the quick return value of "OK". If the driver is not currently doing this, the driver is wrong.
image

It is a similar story with the Telit modem I am using, although the duration is not as long.
image

Regarding "backpowering", that is not handled in the driver at this point.

That's fine, and I'm not trying to handle it, power isn't physically removed from the module, just the reset pin / PWR pin toggling behaviour to shut down.

I don't think we should use that as an argument to make the driver even less compatible with device PM.

Thats fair, but I also don't think the device PM abstraction is really designed to handle ACTION_SUSPEND blocking for up to a minute.
My argument was that since we aren't supporting ACTION_TURN_OFF anyway, there is no real cost to returning early from ACTION_SUSPEND, at least in the context of attempting to perform an orderly modem shutdown before rebooting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking quectel as an isolated case, the rather long timeout has little to do with suspending the modem, its to do with unregistering from the network, the expected time to suspend are those 300ms, which is closer to what I have seen with those modems.

Waiting a minute is definitely not expected for calling suspend, which is why I suggested instead detaching from the network first, then calling SUSPEND. Would it not make sense to add an API (if we don't have something fitting already) to force the modem to deregister?

@JordanYates JordanYates changed the title modem: modem_cellular: periodic script error detection and async shutdown modem: modem_cellular: periodic script error detection Oct 19, 2025
@JordanYates
Copy link
Contributor Author

Dropped the async pm_device_runtime_put changes from the PR.
Recovering the modem without blocking the system workqueue or deadlocking the modem subsystem from the event handler itself is now left as an exercise for the reader :)

Add a callback for the periodic script result so that applications have
a way of detecting dead links.

Signed-off-by: Jordan Yates <[email protected]>
@JordanYates JordanYates added this to the v4.3.0 milestone Oct 19, 2025
Copy link

@tomi-font tomi-font requested review from SeppoTakalo and removed request for tomi-font October 20, 2025 09:39
Comment on lines +142 to +143
/** Result of a communications link check to the modem */
CELLULAR_EVENT_MODEM_COMMS_CHECK_RESULT = BIT(2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the user care if the script was sent successfully? would it not be enough to have CELLULAR_EVENT_MODEM_COMMS_CHECK_FAILED and omit the data passed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. comms checks could periodically fail for any number of reasons. I don't care if one periodic script fails and then it starts working again. I do care if 10 in a row fail. Without knowing when they succeed, you can't reset counters on success.

@jhedberg jhedberg merged commit 3973aaa into zephyrproject-rtos:main Oct 21, 2025
26 checks passed
@JordanYates JordanYates deleted the 251005_modem_events branch October 21, 2025 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Modem Drivers area: Modem area: Tests Issues related to a particular existing or missing test Release Notes To be mentioned in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants