Alternative function for potential optimization of data copying in stream buffers #1233

devprodest · 2025-03-12T10:47:52Z

Description

There are platforms where copying data using the CPU is not very optimal.
The easiest and fastest way is to use DMA.
To implement this functionality, you need to replace the standard memcpy with memcpy with DMA

Test Steps

No additional actions are required. This functionality improves the flexibility of the code.
To use the alternative function, you need to define pvPortMemCpyStreamBuffer in the FreeRTOSIPConfig.h file.

#include "utils\memcpy_with_dma.h" ///< my headeris for example only

#define pvPortMemCpyStreamBuffer(dst, src, count) memcpy_with_dma(dst, src, count)

If not specified, memcpy from the standard library will be used.

Checklist:

I have tested my changes. No regression in existing tests.
I have modified and/or added unit-tests to cover the code changes in this Pull Request.

Related Issue

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…am buffers. Useful for optimization in systems that allow DMA to be used only in some memory areas.

tony-josi-aws · 2025-03-12T14:16:34Z

@devprodest

Thanks for contributing to FreeRTOS+TCP.

Since the memory allocator for TCP stream buffers can be configured if needed by defining pvPortMallocLarge it makes sense to have something similar for memcpy that uses those buffers if it adds value.

I'm curious to know about the usage of DMA based memcpy (memcpy_with_dma in your case) here. Are you yielding the RTOS task (inside memcpy_with_dma) after the DMA has been setup and waiting for the DMA completion interrupt to make the task ready again? If that's the case, are you gaining significant performance benefits if you take into account the extra compute time spent on the context switches?
Or is it polled inside the memcpy_with_dma? I believe there is not much performance benefit if DMA is setup for transfer and then its polled for completion.

devprodest · 2025-03-12T16:26:33Z

@tony-josi-aws
hi, in my DMA platform, it can copy 128-bit-wide data, unlike the CPU, which does it in bytes with unaligned access. Or 32 if aligned access.
The Dma takes care of the alignment itself.
Due to this change, the stack speed up to x10
I'm just using the custom functions malloc to implement zero copying, since the Ethernet in my chip can only use some memory addresses.

devprodest · 2025-03-12T16:27:55Z

due to the features of the platform and saving resources for context switching, polling of the readiness flag is used, but this does not prevent you from making the work very fast.

tony-josi-aws · 2025-03-13T05:27:51Z

@devprodest

Thanks for the update.

So the memcpy_with_dma is setting up the DMA for copying (which in your platform is faster due to the 128-bit wide accesses), and then it polls for the readiness flag to see if the copying is completed. Which is faster than normal memcpy as the DMA is faster, is that right?

Due to this change, the stack speed up to x10

That's a good improvement; was it measured using IPERF? Also wondering which hardware platform you are using.

I'm just using the custom functions malloc to implement zero copying, since the Ethernet in my chip can only use some memory addresses.

You can take a look at this page: TCP/IP Stack Network Buffers Allocation Schemes and their implication on simplicity, CPU load, and throughput performance if you haven't already to see if BufferAllocation_1.c fits your use case better as you can statically allocate network buffers in the specified section of your memory map. [example]

devprodest · 2025-03-13T06:06:29Z

So the memcpy_with_dma is setting up the DMA for copying (which in your platform is faster due to the 128-bit wide accesses), and then it polls for the readiness flag to see if the copying is completed. Which is faster than normal memcpy as the DMA is faster, is that right?

Yes, that's right. this increases the speed of copying.
I wrote more details below.

That's a good improvement; was it measured using IPERF? Also wondering which hardware platform you are using.

No, the check was carried out using an algorithm that is similar to the actual application.
uploaded and downloaded 256 megabytes of data from the device.

I can't say which platform yet, it's a trade secret. But I can describe some of the features. This is a video processing chip. Similar to GoPro or other similar cameras, but with some interesting effects. CPU is 32-bits risc-v.
My platform includes several different memories: TCM, SRAM, and DDR (separate IC). DMA can only work with sram and ddr.

TCM is used for firmware operation. SRAM stores stack buffers and other buffers that DMA should work with.

The main algorithm of operation is uploading data over the network to DDR, processing and downloading back to the PC.

This is where the bottleneck is. DDR is very slow memory compared to sram. And byte-by-byte copying is a very long operation. DMA does this very quickly and in large transactions.

You can take a look at this page: TCP/IP Stack Network Buffers Allocation Schemes and their implication on simplicity, CPU load, and throughput performance if you haven't already to see if BufferAllocation_1.c fits your use case better as you can statically allocate network buffers in the specified section of your memory map. [example]

Thank you for this suggestion. I did this a few days ago and It didn't have the desired effect. The allocator is not currently in use. But that doesn't solve the whole problem.

tony-josi-aws · 2025-03-13T06:56:10Z

/bot run formatting

This reverts commit 22105bc.

Added the ability to use an alternative function to copy data in stre…

4f4afd7

…am buffers. Useful for optimization in systems that allow DMA to be used only in some memory areas.

devprodest requested a review from a team as a code owner March 12, 2025 10:47

aggarg previously approved these changes Mar 13, 2025

View reviewed changes

Uncrustify: triggered by comment.

22105bc

github-actions bot dismissed aggarg’s stale review via 22105bc March 13, 2025 06:56

tony-josi-aws added 2 commits March 13, 2025 12:35

Revert "Uncrustify: triggered by comment."

1c52f67

This reverts commit 22105bc.

Fix formatting

15640c4

aggarg approved these changes Mar 13, 2025

View reviewed changes

tony-josi-aws approved these changes Mar 13, 2025

View reviewed changes

tony-josi-aws merged commit 1e32b23 into FreeRTOS:main Mar 13, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alternative function for potential optimization of data copying in stream buffers #1233

Alternative function for potential optimization of data copying in stream buffers #1233

Uh oh!

devprodest commented Mar 12, 2025 •

edited

Loading

Uh oh!

tony-josi-aws commented Mar 12, 2025

Uh oh!

devprodest commented Mar 12, 2025 •

edited

Loading

Uh oh!

devprodest commented Mar 12, 2025

Uh oh!

tony-josi-aws commented Mar 13, 2025

Uh oh!

devprodest commented Mar 13, 2025 •

edited

Loading

Uh oh!

tony-josi-aws commented Mar 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Alternative function for potential optimization of data copying in stream buffers #1233

Alternative function for potential optimization of data copying in stream buffers #1233

Uh oh!

Conversation

devprodest commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Steps

Checklist:

Related Issue

Uh oh!

tony-josi-aws commented Mar 12, 2025

Uh oh!

devprodest commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devprodest commented Mar 12, 2025

Uh oh!

tony-josi-aws commented Mar 13, 2025

Uh oh!

devprodest commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tony-josi-aws commented Mar 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

devprodest commented Mar 12, 2025 •

edited

Loading

devprodest commented Mar 12, 2025 •

edited

Loading

devprodest commented Mar 13, 2025 •

edited

Loading