Skip to content

Conversation

@Raymond0225
Copy link
Contributor

@Raymond0225 Raymond0225 commented Oct 29, 2024

This patch is addressing these issues:

Fixes #58888
Fixes #70777
Fixes #78291

Tests done:
I2S speed test on RT1170 EVK
uart async api test on RT1170 EVK with bandrate 115200, 1000000 and 2000000bps

Note that:
Please set CONFIG_TEST_USERSPACE=n when you do test on RT1170 EVK or
you will get "write access denied" issue. It is another issue not
related to LPUART and eDMA drivers.

@github-actions
Copy link

Hello @Raymond0225, and thank you very much for your first pull request to the Zephyr project!
Our Continuous Integration pipeline will execute a series of checks on your Pull Request commit messages and code, and you are expected to address any failures by updating the PR. Please take a look at our commit message guidelines to find out how to format your commit messages, and at our contribution workflow to understand how to update your Pull Request. If you haven't already, please make sure to review the project's Contributor Expectations and update (by amending and force-pushing the commits) your pull request if necessary.
If you are stuck or need help please join us on Discord and ask your question there. Additionally, you can escalate the review when applicable. 😊

@SynchronicIT
Copy link
Contributor

SynchronicIT commented Oct 29, 2024

For the LPUART driver, in our real-life application, we tested the proposed solution and noticed an instability. It depends on the TCD queue size. When a certain amount "x" of buffers is consumed and the next buffer should be used, the LPUART or DMA simply stops.

CONFIG_DMA_TCD_QUEUE_SIZE=2 gave 4 working buffer updates
CONFIG_DMA_TCD_QUEUE_SIZE=3 gave 5 working buffer updates
CONFIG_DMA_TCD_QUEUE_SIZE=4 gave 1 working buffer updates (just the initial one..)
CONFIG_DMA_TCD_QUEUE_SIZE=5 gave 3 working buffer updates
CONFIG_DMA_TCD_QUEUE_SIZE=6 gave 1 working buffer updates (just the initial one..)
CONFIG_DMA_TCD_QUEUE_SIZE=7 gave 2 working buffer updates
...
CONFIG_DMA_TCD_QUEUE_SIZE=16 gave 5 working buffer updates

And although it seems a bit random, I tried each value at least 5 times and every time exact the same amount, although the uart input was random (at 1mbit/s in this case).

CONFIG_DMA_TCD_QUEUE_SIZE=1 actually gave the most stable result, and it looks like your initializing more buffers than needed at: zephyr/drivers/dma/dma_mcux_edma.c:391
(the max queue size instead of the requested block_count...

I hope this rings a bell...

@matt-wood-ct
Copy link
Contributor

Possibly silly question, does a similar issue apply to the lpspi implementation? I've been having a lot of trouble getting that writing properly with dma too. In my case I'm seeing junk data coming out the spi when using it with dma channels configured

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this, yo actually change the rx_dma_params

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not change it. next_rx_buffer is a member of data->async

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you code indent is wrong should be 8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I will reformat all the related indent and whitespaces.

DerekSnell added a commit to nxp-upstream/zephyr that referenced this pull request Oct 30, 2024
@DerekSnell
Copy link
Contributor

Hi @Raymond0225 ,
I used your eDMA and I2S driver update to test on the FRDM-MCXN947 board. This is based on the open PR Add mcxn10 sai module support. I had to modify your eDMA driver to support v4, and shared my branch. But the i2s_speed test passes on FRDM-MCXN947 board in this branch.

Thanks

@DerekSnell
Copy link
Contributor

Hi @matt-wood-ct ,

does a similar issue apply to the lpspi implementation?

These changes address an issue with the scatter/gather implementation in the eDMA driver, which is not used by the LPSPI driver. So this PR does not impact the LPSPI driver.
Best regards

@Raymond0225
Copy link
Contributor Author

Hi @Raymond0225 , I used your eDMA and I2S driver update to test on the FRDM-MCXN947 board. This is based on the open PR Add mcxn10 sai module support. I had to modify your eDMA driver to support v4, and shared my branch. But the i2s_speed test passes on FRDM-MCXN947 board in this branch.

Thanks

thanks Derek. I wrote the code for DMA V4 based on an old version. I will update the code based on your modifications.

@Raymond0225
Copy link
Contributor Author

For the LPUART driver, in our real-life application, we tested the proposed solution and noticed an instability. It depends on the TCD queue size. When a certain amount "x" of buffers is consumed and the next buffer should be used, the LPUART or DMA simply stops.

CONFIG_DMA_TCD_QUEUE_SIZE=2 gave 4 working buffer updates CONFIG_DMA_TCD_QUEUE_SIZE=3 gave 5 working buffer updates CONFIG_DMA_TCD_QUEUE_SIZE=4 gave 1 working buffer updates (just the initial one..) CONFIG_DMA_TCD_QUEUE_SIZE=5 gave 3 working buffer updates CONFIG_DMA_TCD_QUEUE_SIZE=6 gave 1 working buffer updates (just the initial one..) CONFIG_DMA_TCD_QUEUE_SIZE=7 gave 2 working buffer updates ... CONFIG_DMA_TCD_QUEUE_SIZE=16 gave 5 working buffer updates

And although it seems a bit random, I tried each value at least 5 times and every time exact the same amount, although the uart input was random (at 1mbit/s in this case).

CONFIG_DMA_TCD_QUEUE_SIZE=1 actually gave the most stable result, and it looks like your initializing more buffers than needed at: zephyr/drivers/dma/dma_mcux_edma.c:391 (the max queue size instead of the requested block_count...

I hope this rings a bell...

Thanks for your feedback. I don't understand what you said. I do use all TCDs and linked them together. Can you elaborate how to reproduce the "stops" issue you mentioned here?

@dleach02
Copy link
Member

@Raymond0225 please clean up the file changes to remove the spaces at beginning of lines. The Zephyr coding standard is to use tabs.

@Raymond0225
Copy link
Contributor Author

@Raymond0225 please clean up the file changes to remove the spaces at beginning of lines. The Zephyr coding standard is to use tabs.

thanks. I am cleaning up the code, will update the PR later.

@SynchronicIT
Copy link
Contributor

Thanks for your feedback. I don't understand what you said. I do use all TCDs and linked them together. Can you elaborate how to reproduce the "stops" issue you mentioned here?

We are building a uart to usb bridge, similar to the usb example. As we need to switch both the DTR and baud rate in real-time, we're using the next-gen usb driver (and have made some fixes for it to make it more robust).
To get the lowest latency, we are coupling the usb uart and physical uart in the irq and async callback functions.
On other chipset's we used solely the fifo / irq method, but due to the small fifo on the RT1060, we need to use the async/dma.

Concerning the Rx and TCDs:
I'm feeding the physical uart now with a data set every 1[s] of 256 bytes at 1mbit and 2 mbit and provide the async driver with buffers of 2KB. If I use anything else than a single TCD by setting it in the config (which reflects the .block_count == 1) I do get several requests for new buffers and buffer release events. But after the mentioned number the requests and releases stop, but any irq from the DMA stops. The uart irq keeps on running (giving overflow errors of course). It reproduced very well for the numbers given, which suggests there is a good relationship.
I don't have a simplified example to test it with, which makes it a bit difficult to hand it over.

Concerning the Tx side:
Having the Tx side from a uart_fifo_irq (of the usb) directly towards the physical uart using the async. (thus not using a separate thread to handle the copy / forwarding with the goal to minimize the latency) The Tx Side of the async seems to be giving inconsistent data output, as with too much data, the uart output becomes 10 bits '0' ( a break instead of start - 8 bytes - stop). I've added atomic bits and I have a "uart_tx(..)" in the TX_READY event handling. I'm not sure if that is even allowed.

Concerning switching the baudrate:
When switching the baud rate, the lpuart driver completely reinitializes the driver. In case the DMA is still active, re-enabling the rx dma (uart_rx(...)) fails as the DMA is still active. I added code to forcefully stop the DMA before calling the configure and restart the uart_rx dma. I see two problems. The baud-rate doesn't seem to switch properly (as the data is partially received (although the baudrate is a factor 4 lower), and the buffers are not flushed ( dma seems to continue where it was previously).

Perhaps we could take this discussion offline / beyond this github issue / pull-request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash this commit with the earlier commit that changes this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your feedback. I am not familiar with git commands, I will try to it.

@Raymond0225
Copy link
Contributor Author

Raymond0225 commented Oct 30, 2024

Thanks for your feedback. I don't understand what you said. I do use all TCDs and linked them together. Can you elaborate how to reproduce the "stops" issue you mentioned here?

We are building a uart to usb bridge, similar to the usb example. As we need to switch both the DTR and baud rate in real-time, we're using the next-gen usb driver (and have made some fixes for it to make it more robust). To get the lowest latency, we are coupling the usb uart and physical uart in the irq and async callback functions. On other chipset's we used solely the fifo / irq method, but due to the small fifo on the RT1060, we need to use the async/dma.

Concerning the Rx and TCDs: I'm feeding the physical uart now with a data set every 1[s] of 256 bytes at 1mbit and 2 mbit and provide the async driver with buffers of 2KB. If I use anything else than a single TCD by setting it in the config (which reflects the .block_count == 1) I do get several requests for new buffers and buffer release events. But after the mentioned number the requests and releases stop, but any irq from the DMA stops. The uart irq keeps on running (giving overflow errors of course). It reproduced very well for the numbers given, which suggests there is a good relationship. I don't have a simplified example to test it with, which makes it a bit difficult to hand it over.

Concerning the Tx side: Having the Tx side from a uart_fifo_irq (of the usb) directly towards the physical uart using the async. (thus not using a separate thread to handle the copy / forwarding with the goal to minimize the latency) The Tx Side of the async seems to be giving inconsistent data output, as with too much data, the uart output becomes 10 bits '0' ( a break instead of start - 8 bytes - stop). I've added atomic bits and I have a "uart_tx(..)" in the TX_READY event handling. I'm not sure if that is even allowed.

Concerning switching the baudrate: When switching the baud rate, the lpuart driver completely reinitializes the driver. In case the DMA is still active, re-enabling the rx dma (uart_rx(...)) fails as the DMA is still active. I added code to forcefully stop the DMA before calling the configure and restart the uart_rx dma. I see two problems. The baud-rate doesn't seem to switch properly (as the data is partially received (although the baudrate is a factor 4 lower), and the buffers are not flushed ( dma seems to continue where it was previously).

Perhaps we could take this discussion offline / beyond this github issue / pull-request.

thanks for your feedback. I suggest you have a look at the the test code of uart_async_api test case, there is a function "test_double_buffer" which may be close to your use case. block_count is only valid during DMA configuration. I suggest you set it as 1. after that, each time when driver request user rx buffer, callback function will be triggered. Driver would not ask for new buffer until current rx buffer is filled fully. There may be multi rx ready event between 2 buf requests.
2 TCDs usually is enough. It is not related to block_count. I suggest set block_count to 1 whatever how many TCD you set.

@SynchronicIT
Copy link
Contributor

thanks for your feedback. I suggest you have a look at the the test code of uart_async_api test case, there is a function "test_double_buffer" which may be close to your use case. block_count is only valid during DMA configuration. I suggest you set it as 1. after that, each time when driver request user rx buffer, callback function will be triggered. Driver would not ask for new buffer until current rx buffer is filled fully. There may be multi rx ready event between 2 buf requests. 2 TCDs usually is enough. It is not related to block_count. I suggest set block_count to 1 whatever how many TCD you set.

I've ran the proposed tests. The test succeeds only when using the TCM regions for the buffers. Is this mandatory?
If the baudrate is higher than 115200 the first tests fail. Also the tests only simulate ideal situation. (Only transmit when the Rx is handled completely). . During the tests i do see the error "No buffers to release from RX DMA!"
I'll update the buffers to reside in the TCM to see if that solves my challenges.

Tx seems to be working now, by moving "uart_tx(..)" to a work task instead of directly in the callback. This is aparently not allowed. (perhaps something to add to the documentation)

Switching baudrate at runtime while DMA is active is still a challenge. Not sure what is actually failing (the dma buffers or the uart settings). Using the fifo api switching baudrate works well.

@hakehuang
Copy link
Contributor

hakehuang commented Oct 31, 2024

@Raymond0225 , I rebase you pr to the latest zephyr code. and test on mimxrt1170_evk_cm7 it fails as below log

scripts/twister --device-testing --device-serial /dev/ttyACM0 -T tests/drivers/uart/uart_async_api/

uart:~$ *** Booting Zephyr OS build (tainted) v4.0.0-rc1-15-g359a81e7db80 ***
Running TESTSUITE uart_async_chain_read
===================================================================
E: No buffers to release from RX DMA!
E: No buffers to release from RX DMA!
START - test_chained_read

    Assertion failed at WEST_TOPDIR/zephyr/tests/drivers/uart/uart_async_api/src/test_uart_async.c:355: test_chained_read_callback: rx_data_idx + evt->data.rx.len <= sizeof(chained_cpy_buf) is false

ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/kernel/sched.c:1393
        aborting essential thread 0x80002580
E: r0/a1:  0x00000004  r1/a2:  0x00000571  r2/a3:  0x4007c000
E: r3/a4:  0x00000004 r12/ip:  0x00000000 r14/lr:  0x3000a55b
E:  xpsr:  0x21000000
E: Faulting instruction address (r15/pc): 0x3000b858
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Current thread: 0x80002580 (sysworkq)
E: Halting system

last commit after rebase
'''
commit 359a81e7db8074487a1d85bf059848a6e92bfac4 (HEAD -> pr_80584_edma_async)
Author: Raymond [email protected]
Date: Wed Oct 30 15:31:19 2024 -0500

drivers: dma: edma: Fix compile issue for eDMA V4

Some data structures are different between eDMA and eDMA V4, use
different macros defined in SDK driver for each of them.
Make indent/leading space modifications.

Signed-off-by: Raymond <[email protected]>

'''

@DerekSnell
Copy link
Contributor

Hi @SynchronicIT ,

The test succeeds only when using the TCM regions for the buffers. Is this mandatory?

The buffers used by the DMA need to be in non-cached memory, otherwise there may be a cache coherency issue. The TCMs are not cached, which is one option for the buffers. Another option is to use slower memories like external SDRAM or internal OCRAM, but the buffers need to be placed in a nocache region. On the RT1xxx platform, the cache is only used by the CPU. Other masters in the SOC like the DMA do not use the cache. This can lead to a cache coherency issue where the DMA is reading/writing to the physical RAM, but the CPU is accessing the cache.

Switching baudrate at runtime while DMA is active is still a challenge.

Your use-case to dynamically change the baud rate is beyond the scope of this PR. You can create a GitHub Discussion for that separate topic.

Thanks for your feedback with this PR.

@Raymond0225
Copy link
Contributor Author

@Raymond0225 , I rebase you pr to the latest zephyr code. and test on mimxrt1170_evk_cm7 it fails as below log

scripts/twister --device-testing --device-serial /dev/ttyACM0 -T tests/drivers/uart/uart_async_api/

uart:~$ *** Booting Zephyr OS build (tainted) v4.0.0-rc1-15-g359a81e7db80 ***
Running TESTSUITE uart_async_chain_read
===================================================================
E: No buffers to release from RX DMA!
E: No buffers to release from RX DMA!
START - test_chained_read

    Assertion failed at WEST_TOPDIR/zephyr/tests/drivers/uart/uart_async_api/src/test_uart_async.c:355: test_chained_read_callback: rx_data_idx + evt->data.rx.len <= sizeof(chained_cpy_buf) is false

ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/kernel/sched.c:1393
        aborting essential thread 0x80002580
E: r0/a1:  0x00000004  r1/a2:  0x00000571  r2/a3:  0x4007c000
E: r3/a4:  0x00000004 r12/ip:  0x00000000 r14/lr:  0x3000a55b
E:  xpsr:  0x21000000
E: Faulting instruction address (r15/pc): 0x3000b858
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Current thread: 0x80002580 (sysworkq)
E: Halting system

last commit after rebase ''' commit 359a81e7db8074487a1d85bf059848a6e92bfac4 (HEAD -> pr_80584_edma_async) Author: Raymond [email protected] Date: Wed Oct 30 15:31:19 2024 -0500

drivers: dma: edma: Fix compile issue for eDMA V4

Some data structures are different between eDMA and eDMA V4, use
different macros defined in SDK driver for each of them.
Make indent/leading space modifications.

Signed-off-by: Raymond <[email protected]>

'''

Thanks Hake. I got same error when I try to put TCD pool to DTCM. it is because some fields inside edma_tcd_t is not initialized.
I will fix it in the following commit. I add a line "memset(&DEV_CFG(dev)->tcdpool[channel][i],0,sizeof(DEV_CFG(dev)->tcdpool[channel][i]));" during TCD pool initialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using write_idx here before we initialize it? Also, do we need to initialize tcd here at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we don't need initialize tcd here, this line is a typo, I will correct it. thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a check here to see if write_idx == CONFIG_DMA_TCD_QUEUE_SIZE? If we still have blocks to queue at that point, I think we should return an error to the user indicating that they need to raise CONFIG_DMA_TCD_QUEUE_SIZE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

Copy link
Contributor

@DerekSnell DerekSnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Raymond0225 ,
Thank you for all your hard work with these fixes.

I am only blocking this to see your updates to i2s_mcux_sai.c with the loop SG support.

Otherwise, I see no issues with your updates. I do think we can improve with comments to help understand how your eDMA driver works, and provided some comments for you to review.

Also, regarding this note in your commit message:

please set CONFIG_TEST_USERSPACE=n when you do test on RT1170 EVK or
you will get "write access denied" issue. It is another issue not related to LPUART and eDMA drivers.

It's good to include that detail in the PR description (and I see it is already there), so others are aware how to replicate your test. But I would not include in the commit message. Over time, that CONFIG_TEST_USERSPACE=n may no longer apply.

And for the PR description, I would point out the CONFIG_TEST_USERSPACE=n requirement is specifically for the uart_async_api test. The drivers in this PR are not impacted by that Kconfig, but the test app for the LPUART driver will fail today with userspace enabled.

Thanks

@Raymond0225 Raymond0225 force-pushed the eDMA_loop_SG_mode_I2S_lpuart branch from 212fb78 to 9710450 Compare November 4, 2024 21:18
@Raymond0225
Copy link
Contributor Author

Much improved, @Raymond0225. And easier to review now, thank you.

But it still needs a little work.

thanks, have done

@Raymond0225 Raymond0225 closed this Nov 6, 2024
@Raymond0225 Raymond0225 reopened this Nov 6, 2024
Pending length should be mainloop count multiply NBYTES of minor loops.

Signed-off-by: Raymond Lei <[email protected]>
Existing dynamic SG mode have some issues which cause uart sync api and
I2S speed test failed.

These issues are: a wrong "Done" bit issue found on UART async api
test. An invalid destination address issue found on I2S speed
test. By introducing loop SG mode, these issues are all fixed.

Some data structures are different between eDMA and eDMA V4, use
different macros defined in SDK driver for each of them.

Signed-off-by: Raymond Lei <[email protected]>
I2S speed test failed because of "Invalid destination address" issue By use
eDMA loop SG mode, this issue is fixed.

Fixes: zephyrproject-rtos#70777

Signed-off-by: Raymond Lei <[email protected]>
New implemented eDMA loop SG mode can be used to lpuart driver to fix
the conflict of "Done" bit issue which cause uart_async_api test failed
when SG mode is enabled.

Also, this loop SG mode works well even with a high bandrate(test done
on 2000000bps).
Fixes: zephyrproject-rtos#78291

Signed-off-by: Raymond Lei <[email protected]>
To reduce the latency of CPU accessing/modifying SW TCD content, it
better to put TCD pool to DTCM by default.

Signed-off-by: Raymond Lei <[email protected]>
@Raymond0225 Raymond0225 force-pushed the eDMA_loop_SG_mode_I2S_lpuart branch from 2629163 to 5848889 Compare November 6, 2024 22:22
Copy link
Contributor

@DerekSnell DerekSnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @Raymond0225 . Thanks for all your hard work with this.

Copy link
Contributor

@danieldegrasse danieldegrasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor (nonblocking) nits, overall change looks good to me though. Thanks for your work here!

if (DEV_CHANNEL_DATA(dev, channel)->busy) {
status->busy = true;
/* Be aware that here we need multiply NBYTES for each minor loops */
/* pending_length is in bytes. Multiply remaining major loop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be in prior commit

EDMA_ClearChannelStatusFlags(DEV_BASE(dev), channel, kEDMA_DoneFlag);
EDMA_HW_TCD_CSR(dev, channel) |= DMA_CSR_ESG_MASK;
#elif (CONFIG_DMA_MCUX_EDMA_V3 || CONFIG_DMA_MCUX_EDMA_V4)
/*We have not verified if this issue exist on V3/V4 HW, jut place a holder here. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/*We have not verified if this issue exist on V3/V4 HW, jut place a holder here. */
/*We have not verified if this issue exist on V3/V4 HW, just place a holder here. */

@dleach02 dleach02 added this to the v4.0.0 milestone Nov 8, 2024
@mmahadevan108 mmahadevan108 merged commit a646624 into zephyrproject-rtos:main Nov 8, 2024
25 checks passed
@Raymond0225 Raymond0225 deleted the eDMA_loop_SG_mode_I2S_lpuart branch November 8, 2024 20:00
@RadekPolyend
Copy link

RadekPolyend commented Nov 29, 2024

This PR breaks SAI driver when used with MIMXRT1170_EVK setup as edma error occurs after first transfer.

@DerekSnell
Copy link
Contributor

Hi @RadekPolyend ,
This PR was tested with the i2s_speed test on the MIMXRT1170-EVK board.

Can you share more details how to replicate the issue you are seeing? I recommend creating a new Issue, and adding details there to replicate and understand the root cause. You can link that issue to this PR.

Thanks for raising this. Best regards

@RadekPolyend
Copy link

RadekPolyend commented Dec 2, 2024

Ok but the test doesn't seem to assume the reload of the tx buffers after the start trigger, it just drains the buffers which were loaded before the trigger. The "real" use case is when data are constantly delivered to the queue once the previous ones are returned. Normal use case is when you have only two buffers, you deliver two buffers first, than start trigger and wait for the first buffer to be returned to slab than fill it and write again and this cycle is repeated all the time. I can see there is long version of the test but it still does not rely on buffers returned to slab. I will provide more details soon.

@RadekPolyend
Copy link

#82470

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: DMA Direct Memory Access area: I2S area: UART Universal Asynchronous Receiver-Transmitter platform: NXP Drivers NXP Semiconductors, drivers platform: NXP NXP

Projects

None yet

10 participants