Skip to content

Conversation

@LeoBriandFiveO
Copy link
Contributor

When receiving data over the eswifi module, we currently read the data first, then allocate a buffer, and finally write the data into the buffer. The issue is that if we can't allocate the buffer, the data that was read is lost. To fix this, we should first attempt to allocate the buffer before reading any data. If we can't allocate the buffer, we should not proceed with reading the data.

@github-actions
Copy link

Hello @LeoBRIANDSmile, and thank you very much for your first pull request to the Zephyr project!
Our Continuous Integration pipeline will execute a series of checks on your Pull Request commit messages and code, and you are expected to address any failures by updating the PR. Please take a look at our commit message guidelines to find out how to format your commit messages, and at our contribution workflow to understand how to update your Pull Request. If you haven't already, please make sure to review the project's Contributor Expectations and update (by amending and force-pushing the commits) your pull request if necessary.
If you are stuck or need help please join us on Discord and ask your question there. Additionally, you can escalate the review when applicable. 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means that we always try to allocate the "full" buffer regardless of how much data there is, this could be wasting lot of net_buf's.
Is it possible to figure out how much data there is to read (without actually reading it), and then allocate just that amount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, I’ll check tomorrow if there is an API to check the size of the pending packet or if we can resize the buffer we have allocated. Thanks !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I pushed a new commit to fix the issue of allocation. My strategy is to allocate the buffer with MTU, then read and then resize with the API net_pkt_update_length() so that we allocate the returned length of read function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I did a mistake, net_pkt_update_length() only modify length attribute of net_pkt, I'll try to resize the pkt by creating a new pkt to the exact size thanks to read return and then copy the old pkt into the new pkt and then unref the old pkt

@loicpoulain
Copy link
Contributor

Can you also double check or confirm that if we discard packet reading (e.g forced allocation failure), we are actually able to retrieve the packet in next/subsequent 'read_work' procedure (and don't fuzz the chip state).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, you can call net_pkt_compact that should remove empty net_buf from net_pkt. This should be called after net_pkt_write.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not better to use net_pkt_trim_buffer() after net_pkt write ? It checks for unused buffers and deallocate them relevantly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trim is ok too, I just forgot it existed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set pkt to NULL in case of non NULL pkt before the do recv callback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add pkt_trim in order to unallocate unused buffers

@LeoBriandFiveO
Copy link
Contributor Author

Test failed on : tests/net/lib/http_server/tls/net.http.server.tls but locally it works.

START - test_tls
E: Page fault at address 0x18a00c (error code 0x0)
E: Linear address not present in page tables
E: Access violation: supervisor thread not allowed to read

PR #76528 got the same issue with the same address page fault, is that weird ?

@LeoBriandFiveO LeoBriandFiveO requested a review from jukkar August 1, 2024 08:19
@LeoBriandFiveO
Copy link
Contributor Author

Can you also double check or confirm that if we discard packet reading (e.g forced allocation failure), we are actually able to retrieve the packet in next/subsequent 'read_work' procedure (and don't fuzz the chip state).

Currently if we have an allocation failure we are going to the label "done", so we never reach __read_data function in that situation. I assume that it doesn't imply packet loss in next read_work.

Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok, just squash the commits together for the final review.

@jukkar
Copy link
Member

jukkar commented Aug 1, 2024

Test failed on : tests/net/lib/http_server/tls/net.http.server.tls but locally it works.

I think this is fixed by #76580

When receiving data over the eswifi module, we currently read the data
first, then allocate a buffer, and finally write the data into the
buffer. The issue is that if we can't allocate the buffer, the data
that was read is lost. To fix this, we should first attempt to allocate
the buffer before reading any data. If we can't allocate the buffer, we
should not proceed with reading the data. By allocating a buffer with
the MTU size, we can read the packet, write it into the allocated buffer
and then resize by removing unused allocated buffer with
net_pkt_trim_buffer().

Signed-off-by: Léo BRIAND <[email protected]>
@LeoBriandFiveO LeoBriandFiveO requested a review from jukkar August 2, 2024 08:23
@MaureenHelm MaureenHelm merged commit cf0c2a5 into zephyrproject-rtos:main Aug 2, 2024
@github-actions
Copy link

github-actions bot commented Aug 2, 2024

Hi @LeoBRIANDSmile!
Congratulations on getting your very first Zephyr pull request merged 🎉🥳. This is a fantastic achievement, and we're thrilled to have you as part of our community!

To celebrate this milestone and showcase your contribution, we'd love to award you the Zephyr Technical Contributor badge. If you're interested, please claim your badge by filling out this form: Claim Your Zephyr Badge.

Thank you for your valuable input, and we look forward to seeing more of your contributions in the future! 🪁

@LeoBriandFiveO
Copy link
Contributor Author

LeoBriandFiveO commented Aug 16, 2024 via email

@davidalo
Copy link

Hi davidalo, logs show us that we can’t allocate buffer but actually it is not a failure because we will try to allocate a new buffer the next time we run off_read_work function (take a look at my commit). So you are true to trace back this point because it should not be an error message but an informative message I guess.

On Thursday, August 15, 2024, davidalo @.> wrote: Since this change I'm getting this error: [00:00:45.112,000] net_pkt: Data buffer (1460) allocation failed. [00:00:45.112,000] wifi_eswifi: Cannot allocate rx packet Any clue? — Reply to this email directly, view it on GitHub <#76483 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHRY7TFNKX6AWY2OD6EOB7LZRTATTAVCNFSM6AAAAABLWLUMEOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJRGQ4TQMBSGE . You are receiving this because you were mentioned.Message ID: @.>

Hello @LeoBRIANDSmile . Thanks for the information. I resolved the issue by increasing the net buffers for the allocation, after doing it everything worked as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants