ESP32-S3: Support execute in place from PSRAM #3024

EliteTK · 2025-01-23T20:22:09Z

Submission Checklist 📝

(NA?) I have updated existing examples or added new ones (if applicable).
I have used cargo xtask fmt-packages command to ensure that all changed code is formatted correctly.
My changes were added to the CHANGELOG.md in the proper section.
(NA?) I have added necessary changes to user code to the Migration Guide.
(I believe so) My changes are in accordance to the esp-rs API guidelines
I have read the CONTRIBUTING.md guide and followed its instructions.

Pull Request Details 📖

Description

This implementation mirrors how the ESP-IDF implementation of this feature (which is based on the Cache_Flash_To_SPIRAM_Copy rom function) works except it differs in a few key ways:

The ESP-IDF seems to map .text and .rodata into the first and second 128 cache pages respectively (although looking at the linker scripts, I'm not sure how, but a runtime check confirmed this seemed to be the case). This is reflected in how the Cache_Count_Flash_Pages, Cache_Flash_To_SPIRAM_Copy rom functions and the ESP-IDF code executing them works. The count function can only be made to count flash pages within the first 256 pages (of which there are 512 on the ESP32-S3). Likewise, the copy function will only copy flash pages which are mapped within the first 256 entries (across two calls). As the esp-hal handles mapping .text and .rodata differently, these ROM functions are technically not appropriate if more than 256 pages of flash (.text and .rodata combined) are in use by the application.

Additionally, the functions both contain bugs, one of which the IDF attempts to work around incorrectly, and the other which the IDF does not appear to be aware of. Details of these bugs can be found on the IDF issue/PR tracker (espressif/esp-idf#15262, espressif/esp-idf#15263).

As a result, this PR contains a heavily modified/adjusted rust re-write of the reverse engineered ROM code combined with a vague port of the ESP-IDF code.

There are three additional noteworthy differences from the ESP-IDF version of the code:

The ESP-IDF allows the .text and .rodata segments to be mapped independently and separately allowing only one to be mapped. But the current version of the code does not allow this flexibility. This can be implemented by checking the address of each page entry against the segment locations to determine which segment each address belongs to.
The ESP-IDF calls cache_ll_l1_enable_bus(..., cache_ll_l1_get_bus(..., SOC_EXTRAM_DATA_HIGH, 0)); (functions from the ESP-IDF) in order to "Enable the most high bus, which is used for copying FLASH .text to PSRAM" but on the ESP32-S3 after careful inspection these calls result in a no-op as the address passed to cache_ll_l1_get_bus will result in an empty cache bus mask. It's currently unclear to me if this is a bug in the ESP-IDF code, or if this code (which from cursory investigation is probably not a no-op on the -S2) is solely targetting the ESP32-S3.
The ESP-IDF calls Cache_Flash_To_SPIRAM_Copy with an icache address when copying .text and a dcache address when copying .rodata. This affects which cache the reads will occur through. But the writes always go through a "spare page" (name I came up with during reverse engineering) via the dcache. This code performs all reads through the dcache. I don't know if there's a proper reason to read through the correct cache when doing the copy and this doesn't appear to have any negative impact.

Please let me know what this PR needs as I am not particulary experienced both with ESP32 development, embedded rust, unsafe rust, or this project.

Let me know if this change should have more documentation and where it's appropriate. As the psram stuff is unstable, I have assumed that it doesn't need to go in the migration guide, but I could be wrong about this.

Testing

I've done some basic testing of this version of the code on an ESP32-S3. I plan on doing some more testing (especially of the special handling of the spare page) but I felt like it was a good idea to get this on github for early comments.

Dominaezzz · 2025-01-23T22:04:16Z

I'm thinking the bulk of the code should go into a mmu.rs file.

esp-hal/src/soc/esp32s3/psram.rs

bjoernQ · 2025-01-24T05:09:14Z

This already looks very good from my point of view. Thanks for taking care of this

EliteTK · 2025-01-24T12:49:19Z

I'm thinking the bulk of the code should go into a mmu.rs file.

Do you think it's a good idea to move all the SOC_MMU_ constants into mmu.rs as well (further dropping the MMU_ prefix) and change the existing code references to refer to them?

I would just like to avoid duplicating the constants.

Dominaezzz · 2025-01-24T12:53:42Z

Yeah I'd say move the mmu constants as well

Previously the call to cache_dbus_mmu_set would have likely failed instead. This is more explicit.

This implementation mirrors how the ESP-IDF implementation of this feature (which is based on the `Cache_Flash_To_SPIRAM_Copy` rom function) works except it differs in a few key ways: The ESP-IDF seems to map `.text` and `.rodata` into the first and second 128 cache pages respectively (although looking at the linker scripts, I'm not sure how, but a runtime check confirmed this seemed to be the case). This is reflected in how the `Cache_Count_Flash_Pages`, `Cache_Flash_To_SPIRAM_Copy` rom functions and the ESP-IDF code executing them works. The count function can only be made to count flash pages within the first 256 pages (of which there are 512 on the ESP32-S3). Likewise, the copy function will only copy flash pages which are mapped within the first 256 entries (across two calls). As the esp-hal handles mapping `.text` and `.rodata` differently, these ROM functions are technically not appropriate if more than 256 pages of flash (`.text` and `.rodata` combined) are in use by the application. Additionally, the functions both contain bugs, one of which the IDF attempts to work around incorrectly, and the other which the IDF does not appear to be aware of. Details of these bugs can be found on the IDF issue/PR tracker[0][1]. As a result, this commit contains a heavily modified/adjusted rust re-write of the reverse engineered ROM code combined with a vague port of the ESP-IDF code. There are three additional noteworthy differences from the ESP-IDF version of the code: 1. The ESP-IDF allows the `.text` and `.rodata` segments to be mapped independently and separately allowing only one to be mapped. But the current version of the code does not allow this flexibility. This can be implemented by checking the address of each page entry against the segment locations to determine which segment each address belongs to. 2. The ESP-IDF calls `cache_ll_l1_enable_bus(..., cache_ll_l1_get_bus(..., SOC_EXTRAM_DATA_HIGH, 0));` (functions from the ESP-IDF) in order to "Enable the most high bus, which is used for copying FLASH `.text` to PSRAM" but on the ESP32-S3 after careful inspection these calls result in a no-op as the address passed to cache_ll_l1_get_bus will result in an empty cache bus mask. It's currently unclear to me if this is a bug in the ESP-IDF code, or if this code (which from cursory investigation is probably not a no-op on the -S2) is solely targetting the ESP32-S3. 3. The ESP-IDF calls `Cache_Flash_To_SPIRAM_Copy` with an icache address when copying `.text` and a dcache address when copying `.rodata`. This affects which cache the reads will occur through. But the writes always go through a "spare page" (name I came up with during reverse engineering) via the dcache. This code performs all reads through the dcache. I don't know if there's a proper reason to read through the correct cache when doing the copy and this doesn't appear to have any negative impact. [0]: espressif/esp-idf#15262 [1]: espressif/esp-idf#15263

EliteTK · 2025-01-25T01:12:07Z

I've made all the requested changes.

Let me know if the explicitly limited visibility (pub(super)) is excessive and you want it removed.

Let me know if I've gone too far in the new MMU module refactoring.

I've also tried to make indices and sizes all usize instead of the previous mishmash of u32 and usize.

I have a few ideas for how the MMU module could be better but I didn't want to get too carried away (or more accurately, I already got too carried away on another branch and then scrapped the idea).

I noticed a "bug" in the previous PSRAM code. To be fair I think actually it wouldn't be a bug but more just a confusing error message.

Don't hold back any nitpicks.

EliteTK · 2025-01-25T01:19:29Z

I thought I could get away with running lint on only esp-hal. Apparently not. Whoops.

Dominaezzz

This is looking really good, I'll try and give it a spin myself soon.

Dominaezzz · 2025-01-25T12:53:48Z

esp-hal/src/soc/esp32s3/psram.rs

+    /// Refer to
+    /// <https://docs.espressif.com/projects/esp-idf/en/stable/esp32s3/api-guides/external-ram.html#execute-in-place-xip-from-psram>
+    /// for more information.
+    pub execute_from_psram: bool,


Thinking about this some more, given there's more than one way to cut this XiP cake, especially once we start looking at static linking in PSRAM, I think this boolean should be separate from the PsramConfig.
I'm currently thinking it should go in esp_hal::Config, then init_psram can take a bool to do the right thing.

esp-hal/src/soc/esp32s3/mmu.rs

Dominaezzz · 2025-01-30T22:47:28Z

It'd be a good idea to add a HIL test for this.

I tried using this but my S3 hangs in esp_hal::init for some reason, haven't looked in it yet

yanshay · 2025-01-30T22:52:03Z

A question out of my total lack of knowledge, so forgive me if it's a dumb one. Because this feature involves mapping PSRAM over Flash, is it going to conflict/support in OTA firmware update where you need to do flashing while running and can potentially boot from two different flash addresses?

EliteTK · 2025-01-30T22:54:34Z

@Dominaezzz

Absolutely, I will get onto HIL tests as soon as I've gotten around to actually putting together a USB cable to connect to the magic JTAG USB port. I hope to do that this weekend.

Dominaezzz · 2025-01-30T23:18:20Z

A question out of my total lack of knowledge, so forgive me if it's a dumb one. Because this feature involves mapping PSRAM over Flash, is it going to conflict/support in OTA firmware update where you need to do flashing while running and can potentially boot from two different flash addresses?

Take this with a grain of salt, but iirc writing to flash bypasses the mmu so this shouldn't impede that.
I'm not sure about the booting from multiple addresses.

ProfFan · 2025-02-16T19:42:58Z

Also #2027

yanshay · 2025-05-08T15:54:00Z

@EliteTK I tried this PR for speeding up DMA from PSRAM to RGB display as I know running from flash is an obstacle for PSRAM<->RGB DMA performance.

I didn't see any change on the display.

All I did is added the line below. Is it enough or should I do something in addition?

    let mut psram_config = PsramConfig::default();

    psram_config.execute_from_psram = true; // <-------- This is what I added

    let mut peripherals = esp_hal::init(
        esp_hal::Config::default()
            .with_cpu_clock(CpuClock::max())
            .with_psram(psram_config),
    );

EliteTK · 2025-05-12T10:45:33Z

@EliteTK I tried this PR for speeding up DMA from PSRAM to RGB display as I know running from flash is an obstacle for PSRAM<->RGB DMA performance.

I didn't see any change on the display.

All I did is added the line below. Is it enough or should I do something in addition?
    let mut psram_config = PsramConfig::default();

    psram_config.execute_from_psram = true; // <-------- This is what I added

    let mut peripherals = esp_hal::init(
        esp_hal::Config::default()
            .with_cpu_clock(CpuClock::max())
            .with_psram(psram_config),
    );

As in, you didn't see any change in the sense that it's just as glitchy? I think a side by side video would be ideal, however I will mention something slightly more important:

When I worked on this, I discovered that PSRAM support with esp-hal is quite basic currently, which means (I believe, could be wrong) that the PSRAM is not configured to run at full performance. It's also possible that PSRAM is outright unstable for some use cases.

ESP-IDF does some additional configuration steps with PSRAM which involve writing a known bit of data at a set speed and then playing around with various timing variables to find the fastest PSRAM configuration which doesn't result in data corruption. esp-hal doesn't currently (as of this PR's time) do that. Which suggests that with esp-hal a: PSRAM will be running slower than it could and b: PSRAM might not be stable.

When I ran this PR on my particular hardware it did improve things a little bit, but I still couldn't run the display at the full frequency that I could get it running from ESP-IDF code.

At some point (no idea how soon, this PR is a side-project for a side-project) I will probably get back to this and try to figure this all out, but life is full of side-projects and at this very moment this project is not at the very top of my list.

yanshay · 2025-05-12T23:24:49Z

As in, you didn't see any change in the sense that it's just as glitchy? I think a side by side video would be ideal, however I will mention something slightly more important:

My test so far is just showing a colored screen (blue). You can see two images,
The first is with execute_from_psram = false, the second is with true. No difference whatsoever from what I can tell.
I'd expect some difference (unless of course the issues are not due to PSRAM speed, but @Dominaezzz thinks this is the case and it makes sense looking at the image, having a good band at the left since DMA was able to pull data during the black period and then bad once exhausted that data).

When I ran this PR on my particular hardware it did improve things a little bit, but I still couldn't run the display at the full frequency that I could get it running from ESP-IDF code.

I guess you mean not at the full fps? I'm not even there, just want a steady colored frame.

Maybe I'm doing something wrong?

I'm just doing a loop of dma transfers of full buffer from psram:

    for _ in 0..400 {
        match dpi.send(false, dma_tx) {
            Ok(xfer) => {
                let (_res, dpi2, tx2) = xfer.wait();
                dpi = dpi2;
                dma_tx = tx2;
            }
            Err((e, dpi2, tx2)) => {
                error!("Initial DMA send error: {:?}", e);
                dpi = dpi2;
                dma_tx = tx2;
            }
        }
    }

EliteTK · 2025-05-16T14:19:19Z

I'll try to review it this weekend. I also ran into a performance wall which resulted in the DMA peripheral not being able to feed the DPI peripheral fast enough. But there was definitely a marked difference between with psram and without psram.

Can you show me the DPI peripheral configuration too? I am just curious how fast it is you are driving this screen.

Also, have you tried the hue-walk test code (examples/src/bin/lcd_dpi.rs) which @Dominaezzz wrote for the original DPI PR? That should be able to feed things at max speed. If you still get artifacts with that then I think it's safe to conclude it's not a memory speed issue in your case.

yanshay · 2025-05-16T18:15:29Z

Attached is the entire main.rs I run. It's basically the example code that does the hue-walk test adapted to the device I'm using which worked perfectly. I kept most/all of the code and just remarked what I didn't need (or think I don't need) and modified what I needed to make it work.

To that same example I added DMA of a full buffer. So basically it runs 400 iterations of pushing a full frame and then moves on to the hue-walk. The image from the full frame is what you see in the images earlier. The hue-walk that comes right after works great.

And at the beginning I do the intiailization and there changed the psram_config.execute_from_psram.

main.rs.zip

EliteTK mentioned this pull request Jan 23, 2025

Support XiP from PSRAM #2083

Open

Dominaezzz reviewed Jan 23, 2025

View reviewed changes

esp-hal/src/soc/esp32s3/psram.rs Outdated Show resolved Hide resolved

bjoernQ reviewed Jan 24, 2025

View reviewed changes

esp-hal/src/soc/esp32s3/psram.rs Outdated Show resolved Hide resolved

EliteTK added 3 commits January 25, 2025 00:38

Refactor MMU related code into its own module

bec390c

Guard against an MMU table with no trailing empty entries

e6e7a41

Previously the call to cache_dbus_mmu_set would have likely failed instead. This is more explicit.

EliteTK force-pushed the psram-xip branch from 4697f61 to 98a5d15 Compare January 25, 2025 01:05

Only compile mmu module when psram is enabled

06b721b

EliteTK added 2 commits January 25, 2025 01:30

fixup! Refactor MMU related code into its own module

05e22e0

fixup! ESP32-S3: Support execute in place from PSRAM

ea7ab39

Dominaezzz reviewed Jan 25, 2025

View reviewed changes

yanshay mentioned this pull request May 23, 2025

Introduce RGB/DPI driver #2415

Merged

6 tasks

ProfFan mentioned this pull request Sep 6, 2025

Writing to flash freezes the whole system on ESP32-S3 #4031

Open

ESP32-S3: Support execute in place from PSRAM #3024

Are you sure you want to change the base?

ESP32-S3: Support execute in place from PSRAM #3024

Conversation

EliteTK commented Jan 23, 2025

Submission Checklist 📝

Pull Request Details 📖

Description

Testing

Uh oh!

Dominaezzz commented Jan 23, 2025

Uh oh!

Uh oh!

Uh oh!

bjoernQ commented Jan 24, 2025

Uh oh!

EliteTK commented Jan 24, 2025

Uh oh!

Dominaezzz commented Jan 24, 2025

Uh oh!

EliteTK commented Jan 25, 2025

Uh oh!

EliteTK commented Jan 25, 2025

Uh oh!

Dominaezzz left a comment

Choose a reason for hiding this comment

Uh oh!

Dominaezzz Jan 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Dominaezzz commented Jan 30, 2025

Uh oh!

yanshay commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EliteTK commented Jan 30, 2025

Uh oh!

Dominaezzz commented Jan 30, 2025

Uh oh!

ProfFan commented Feb 16, 2025

Uh oh!

yanshay commented May 8, 2025

Uh oh!

EliteTK commented May 12, 2025

Uh oh!

yanshay commented May 12, 2025

Uh oh!

EliteTK commented May 16, 2025

Uh oh!

yanshay commented May 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yanshay commented Jan 30, 2025 •

edited

Loading