Skip to content

Conversation

@SeppoTakalo
Copy link
Contributor

Pull fixes from upstream to coap_client used by nRF Cloud libraries.

Internal issue reported: NCSDK-29890

@rlubos
Copy link
Contributor

rlubos commented Oct 31, 2024

I'm puzzled why wasn't the compliance issue reported in the upstream PR?

LINE_SPACING: Missing a blank line after declarations
File:tests/net/lib/coap_client/src/stubs.c

About the Commit tags failure, were the commits modified somehow? Note, we should not do that, if there are conflicts, pull dependencies.

@rlubos
Copy link
Contributor

rlubos commented Oct 31, 2024

@carlescufi IMO this qualifies for an exception from the conflict-free cherry-pick rule, fixing dependencies would require to pull this tree-wide change, which does not apply cleanly on it's own:
zephyrproject-rtos/zephyr@c4803752a86
IMO pursuing that further is pointess, let's just ignore the check for this PR.

hubertmis and others added 19 commits November 1, 2024 12:02
Constrained application can require to avoid waiting for response
from a server. CoAP No-response option defined in RFC 7967 allows
to do that by suppressing selected response types.

Signed-off-by: Hubert Miś <[email protected]>
(cherry picked from commit 70c9bf1)
This commit makes sure we continue to wait for extra confirmations even
after the request is done so we can handle duplicate confirmations if any.

Detailed description:

rfc7252#section-4.5 specifies that:

"The recipient SHOULD acknowledge each duplicate copy of a
 Confirmable message".

So if, for example, the client sends to a multicast destination address,
the server will get multiple requests and will confirm all of them.

Without this commit, the client will set the request to done after
receiving the first answer.
From here the request object will be marked as free and the duplicate
acknowledgements will stay buffered in the network stack.
Once the client tries to send a new request, it will unbuffer those
duplicate acknowledgements but now the request object is unallocated
so the client won't be able to handle those acknowledgements as duplicates.
It will instead treat it as an unexpected ACK.

To work around this issue, rfc7252#section-4.8.2 states that:

"EXCHANGE_LIFETIME is the time from starting to send a Confirmable
 message to the time when an acknowledgement is no longer expected,
 i.e., message-layer information about the message exchange can be
 purged."

Keeping the request object allocated for EXCHANGE_LIFETIME ensures that
duplicate acknowledgements can be handled accordingly.

This commit adds a basic implementation of what is stated in the RFC.

EXCHANGE_LIFETIME has been arbitrarily set to 3 * ACK_TIMEOUT which
seems more reasonable than the 247 seconds stated in the RFC.

Signed-off-by: Francois Gervais <[email protected]>
(cherry picked from commit bfed8d0)
…llocated

As confirmable requests will stay allocated for (3 * ACK_TIMEOUT), we
need to adjust the timings so all requests are unallocated by the end
of the test.

Signed-off-by: Francois Gervais <[email protected]>
(cherry picked from commit 13cad59)
Passing MACRO arguments that need expansion require _CONCAT.

Signed-off-by: Pieter De Gendt <[email protected]>
(cherry picked from commit 476c12f)
The goal of this commit is to reduce the overall test time by optimizing
the time spent sleeping.

However while doing so, a few glitches became apparent and those have
been fixed as well.

1. The way the ZSOCK_POLLIN event is managed has been modified

In most tests, the event would stick for the whole test and this would
sometimes results in the client receiving more data than intended which
in turn would results in various unexpected warnings and errors.

The management of the ZSOCK_POLLIN event has now been mostly moved to
the send/receive overrides which better represent how things would
happen outside of the test scenario.

For example, when sending data, if this send would normally result in
receiving some response, the send function sets the ZSOCK_POLLIN event.
Then when receiving, we clear the event as the data has been received.

There are a few exceptions to this for cases where we need to simulate
some specific abnormal behavior.

2. The `messages_needing_response` queue now includes a valid bit

The test manages a queue of messages id to be able to respond to the
correct id in the receive hooks.

It was built in a way that the id 0 would be treated as invalid
although it is a valid id.
In practice, it would not be an issue in most cases however, it would
result in an unexpected behavior if the `test_multiple_requests` test
would be run first.

To fix this issue in the simplest way, the queue has been changed to
uint32_t which gives us 16 extra bits for the management of the queue,
1 of which is used to tell if the entry is valid or not.

Signed-off-by: Francois Gervais <[email protected]>
(cherry picked from commit 736344b)
…rable

Extend the `coap_transmission_parameters` struct with the field
`ack_random_percent`. This was the last remaining CoAP transmission
parameter that was not configurable at runtime.

Signed-off-by: Adrian Friedli <[email protected]>
(cherry picked from commit 98289e5)
Before this patch, any unexpected socket error during poll (caused by
LTE disconnects for instance) would lead to a infinite loop because the
error state was never cleared, handled or even signaled to the user.

Signed-off-by: Benjamin Lindqvist <[email protected]>
(cherry picked from commit f8a7035)
Protect global list of clients with mutex.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 1ea569d)
…ailed to send

When transmission of first request fails, reset the internal request
buffer as there is no ongoing CoAP transaction.

Application can deal with the failure.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 46b7c84)
…etrying CoAP msg

Refactor the CoAP retry handling into the handle_poll() function,
so that we only try to send retries if the socket reports POLLOUT.

Also move the receiving into same loop, so when poll() reports POLLIN
we recv() the message and handle it before proceeding to other sockets.

Also fix tests to handle POLLOUT flag and add support for testing
multiple clients.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 4c6dd4c)
…ng loop

Forward recv() errors to handle_poll(), so there is only one place to
handle error codes.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 6481b0e)
… send() failure

If send() fails, we have not technically send the CoAP retry yet, so
restore the same pending structure, so our timeouts and retry counters
stay the same.

This will trigger a retry next time the poll() return POLLOUT, so we
know that we can send.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 623a1ff)
…stead of flagging

It is error prone to flag separate booleans, so try to use
reset_internal_request() every time we release the internal request
structure.

Also refactor the reset_internal_request() so that we reset the
timeout value so it does not trigger again.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit a14f083)
Fix handling of received CoAP reset.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 1890dbd)
Add helper API to construct CoAP Reset message.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit e96e95b)
Response tokens are already compared in get_request_with_token().

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 1dc2487)
When receiving unknown response, respond with CoAP Reset.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 350d20e)
If we receive poll() error during a request, it must
be forwarded to application.

If instead receive poll() error later, it should not be
forwarded as it might be result of application closing the
socket.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 9c9dc9f)
In receiving thread, continuing the loops is based on
has_ongoing_exchanges() so it does not need atomic
coap_client_recv_active variable.

When idling, it wakes from semaphore. But there was potential
deadlock when coap_client_schedule_poll() would not signal the
semaphore, if atomic variable was already showing that it runs.
Removing the atomic variable removes this deadlock.

Signed-off-by: Seppo Takalo <[email protected]>
(cherry picked from commit 1e5a537)
@SeppoTakalo
Copy link
Contributor Author

Updated the PR.
Now contains only [nrf fromtree] as the upstream PR is now merged.

Also fixed some merging issues as I previously had some commits in wrong order. Should apply more cleanly now.

@SeppoTakalo
Copy link
Contributor Author

a bit unfortunate that our CI runs those compliance checks that upsream did not. Now it complains about styling on commits that are already merged in.

@SeppoTakalo SeppoTakalo added this to the ncs-2.8.0 milestone Nov 1, 2024
@rlubos
Copy link
Contributor

rlubos commented Nov 4, 2024

a bit unfortunate that our CI runs those compliance checks that upsream did not. Now it complains about styling on commits that are already merged in.

I guess there's no way around then other than taking this as is.

@rlubos rlubos added the backport v3.7.99-ncs1-branch Relates to NCS v2.8-branch label Nov 4, 2024
@rlubos rlubos merged commit 53b26fb into nrfconnect:main Nov 4, 2024
18 of 19 checks passed
@SeppoTakalo SeppoTakalo deleted the coap_client_nrf_fix branch November 4, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport v3.7.99-ncs1-branch Relates to NCS v2.8-branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants