-
Notifications
You must be signed in to change notification settings - Fork 8.1k
modem: modem_cellular: periodic script error detection #97024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
include/zephyr/drivers/cellular.h
Outdated
/** Periodic script result */ | ||
CELLULAR_EVENT_PERIODIC_SCRIPT_RESULT = BIT(2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This event is internal to the modem_cellular.c driver, unlike CELLULAR_EVENT_REGISTRATION_STATUS_CHANGED
which is generic to any modem driver. I believe it would be more portable to make this event generic to what it indicates, rather than what it "is". Something like CELLULAR_EVENT_MODEM_UNRESPONSIVE
, or CELLULAR_EVENT_MODEM_DEAD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The event is called on both success and failure so those names don't work, but I see your point. Would you be happy with CELLULAR_EVENT_MODEM_KEEPALIVE
?
drivers/modem/Kconfig.cellular
Outdated
config MODEM_CELLULAR_SUSPEND_ASYNC | ||
bool "Suspend the modem asynchronously instead of blocking" | ||
help | ||
Suspending a cellular modem can be a long running process, and blocking | ||
the context that triggers the suspension can be undesirable. Enabling | ||
this option lets the `PM_DEVICE_ACTION_SUSPEND` handler return | ||
immediately, with notification of the suspension provided through the | ||
`CELLULAR_EVENT_MODEM_SUSPENDED` event. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks a promise of pm_device_action_run(), specifically that the call is synchronous. It looks like this solution has an issue with power domains. If its on a power domain, power will be cut immediately after the call with PM_DEVICE_ACTION_SUSPEND returns. This may prevent the modem driver from reaching the idle state in the first place since the modem will not respond to anything, its not powered. Additional logic can be added to the modem_cellular_pm_action
hook to react to the PM_DEVICE_ACTION_TURN_OFF which will be received right before the power domain is turned of, to just "hard reset" the driver and call back with the CELLULAR_EVENT_MODEM_SUSPENDED
event at that point.
Could some other call to speed up the suspend process be added to the modem API for example? If the thing that takes a long time is unregistering from the network, maybe call net_if_down() and wait for PPP to be terminated, then call pm_device_runtime_put()? What are the operations that take so long we need to do it async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks a promise of pm_device_action_run(), specifically that the call is synchronous.
Yes
It looks like this solution has an issue with power domains.
The driver already doesn't work properly with power domains (random IO pins would be held high), so I wasn't too concerned with this.
Additional logic can be added to the
modem_cellular_pm_action
hook to react to thePM_DEVICE_ACTION_TURN_OFF
I could add this if it would negate the concern in the first point
Could some other call to speed up the suspend process be added to the modem API for example?
That would break PM promises in different ways, you would have the device being suspended while still in the ACTIVE
state.
What are the operations that take so long we need to do it async?
No idea in the general case (i.e. I have no idea what the Quectel AT+QPOWD=1
command would be doing or how long it would take), but all the drivers have a shutdown_ms
parameter on the order of several seconds that the driver just waits for after pulsing the power pin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shutdown_ms
is a fallback for modems which don't have a shutdown script, or to force the modem off if the shutdown script fails. "AT+QPOWD=1" is the commend sent to shut down a Quectel modem, without using the power or reset pins, once it returns OK, the modem is off and idle is entered, no shutdown_ms
used.
Regarding "backpowering", that is not handled in the driver at this point. I think a good solution would be to implement it in the pm action handler on the TURN_OFF event like the example driver here https://docs.zephyrproject.org/latest/services/pm/device.html#device-model-with-device-power-management-support, I don't think we should use that as an argument to make the driver even less compatible with device PM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shutdown_ms is a fallback for modems which don't have a shutdown script, or to force the modem off if the shutdown script fails. "AT+QPOWD=1" is the commend sent to shut down a Quectel modem, without using the power or reset pins, once it returns OK, the modem is off and idle is entered, no shutdown_ms used.
That is incorrect, Quectel modules can take up to 60 seconds to properly disconnect from the network and must remain powered for that entire duration, regardless of the quick return value of "OK". If the driver is not currently doing this, the driver is wrong.
It is a similar story with the Telit modem I am using, although the duration is not as long.
Regarding "backpowering", that is not handled in the driver at this point.
That's fine, and I'm not trying to handle it, power isn't physically removed from the module, just the reset pin / PWR pin toggling behaviour to shut down.
I don't think we should use that as an argument to make the driver even less compatible with device PM.
Thats fair, but I also don't think the device PM abstraction is really designed to handle ACTION_SUSPEND
blocking for up to a minute.
My argument was that since we aren't supporting ACTION_TURN_OFF
anyway, there is no real cost to returning early from ACTION_SUSPEND
, at least in the context of attempting to perform an orderly modem shutdown before rebooting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking quectel as an isolated case, the rather long timeout has little to do with suspending the modem, its to do with unregistering from the network, the expected time to suspend are those 300ms, which is closer to what I have seen with those modems.
Waiting a minute is definitely not expected for calling suspend, which is why I suggested instead detaching from the network first, then calling SUSPEND. Would it not make sense to add an API (if we don't have something fitting already) to force the modem to deregister?
5472712
to
f03c515
Compare
Dropped the async |
Add a callback for the periodic script result so that applications have a way of detecting dead links. Signed-off-by: Jordan Yates <[email protected]>
f03c515
to
354f0d0
Compare
|
/** Result of a communications link check to the modem */ | ||
CELLULAR_EVENT_MODEM_COMMS_CHECK_RESULT = BIT(2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the user care if the script was sent successfully? would it not be enough to have CELLULAR_EVENT_MODEM_COMMS_CHECK_FAILED
and omit the data passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. comms checks could periodically fail for any number of reasons. I don't care if one periodic script fails and then it starts working again. I do care if 10 in a row fail. Without knowing when they succeed, you can't reset counters on success.
Add a callback for the periodic script result so that applications have a way of detecting dead links.
The goal of this PR is to enable this pattern to recover after a dead modem link.
The advantage of this over just rebooting immediately is that it gives the modem a chance to cleanly disconnect from the LTE network.