Skip to content

Commit 98a5d15

Browse files
committed
ESP32-S3: Support execute in place from PSRAM
This implementation mirrors how the ESP-IDF implementation of this feature (which is based on the `Cache_Flash_To_SPIRAM_Copy` rom function) works except it differs in a few key ways: The ESP-IDF seems to map `.text` and `.rodata` into the first and second 128 cache pages respectively (although looking at the linker scripts, I'm not sure how, but a runtime check confirmed this seemed to be the case). This is reflected in how the `Cache_Count_Flash_Pages`, `Cache_Flash_To_SPIRAM_Copy` rom functions and the ESP-IDF code executing them works. The count function can only be made to count flash pages within the first 256 pages (of which there are 512 on the ESP32-S3). Likewise, the copy function will only copy flash pages which are mapped within the first 256 entries (across two calls). As the esp-hal handles mapping `.text` and `.rodata` differently, these ROM functions are technically not appropriate if more than 256 pages of flash (`.text` and `.rodata` combined) are in use by the application. Additionally, the functions both contain bugs, one of which the IDF attempts to work around incorrectly, and the other which the IDF does not appear to be aware of. Details of these bugs can be found on the IDF issue/PR tracker[0][1]. As a result, this commit contains a heavily modified/adjusted rust re-write of the reverse engineered ROM code combined with a vague port of the ESP-IDF code. There are three additional noteworthy differences from the ESP-IDF version of the code: 1. The ESP-IDF allows the `.text` and `.rodata` segments to be mapped independently and separately allowing only one to be mapped. But the current version of the code does not allow this flexibility. This can be implemented by checking the address of each page entry against the segment locations to determine which segment each address belongs to. 2. The ESP-IDF calls `cache_ll_l1_enable_bus(..., cache_ll_l1_get_bus(..., SOC_EXTRAM_DATA_HIGH, 0));` (functions from the ESP-IDF) in order to "Enable the most high bus, which is used for copying FLASH `.text` to PSRAM" but on the ESP32-S3 after careful inspection these calls result in a no-op as the address passed to cache_ll_l1_get_bus will result in an empty cache bus mask. It's currently unclear to me if this is a bug in the ESP-IDF code, or if this code (which from cursory investigation is probably not a no-op on the -S2) is solely targetting the ESP32-S3. 3. The ESP-IDF calls `Cache_Flash_To_SPIRAM_Copy` with an icache address when copying `.text` and a dcache address when copying `.rodata`. This affects which cache the reads will occur through. But the writes always go through a "spare page" (name I came up with during reverse engineering) via the dcache. This code performs all reads through the dcache. I don't know if there's a proper reason to read through the correct cache when doing the copy and this doesn't appear to have any negative impact. [0]: espressif/esp-idf#15262 [1]: espressif/esp-idf#15263
1 parent e6e7a41 commit 98a5d15

File tree

3 files changed

+150
-4
lines changed

3 files changed

+150
-4
lines changed

esp-hal/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1010
### Added
1111
- SPI: Added support for 3-wire SPI (#2919)
1212
- Add separate config for Rx and Tx (UART) #2965
13+
- ESP32-S3: Support execute in place from PSRAM
1314

1415
### Changed
1516

esp-hal/src/soc/esp32s3/mmu.rs

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,11 @@
55
66
const DBUS_VADDR_BASE: u32 = 0x3C000000;
77
const DR_REG_MMU_TABLE: u32 = 0x600C5000;
8+
const ENTRY_ACCESS_FLASH: u32 = 0;
89
const ENTRY_INVALID: u32 = 1 << 14;
10+
const ENTRY_TYPE: u32 = 1 << 15;
11+
const ENTRY_VALID: u32 = 0;
12+
const ENTRY_VALID_VAL_MASK: u32 = 0x3fff;
913
const ICACHE_MMU_SIZE: usize = 0x800;
1014

1115
pub(super) const ENTRY_ACCESS_SPIRAM: u32 = 1 << 15;
@@ -29,6 +33,10 @@ extern "C" {
2933
num: u32,
3034
fixed: u32,
3135
) -> i32;
36+
37+
fn Cache_Invalidate_Addr(addr: u32, size: u32);
38+
fn Cache_WriteBack_All();
39+
fn rom_Cache_WriteBack_Addr(addr: u32, size: u32);
3240
}
3341

3442
#[procmacros::ram]
@@ -43,3 +51,116 @@ pub(super) fn last_mapped_index() -> Option<usize> {
4351
pub(super) fn index_to_data_address(index: usize) -> u32 {
4452
DBUS_VADDR_BASE + (PAGE_SIZE * index) as u32
4553
}
54+
55+
/// Count flash-mapped pages, de-duplicating mappings which refer to flash page
56+
/// 0
57+
#[procmacros::ram]
58+
pub(super) fn count_effective_flash_pages() -> usize {
59+
let mmu_table_ptr = DR_REG_MMU_TABLE as *const u32;
60+
let mut page0_seen = false;
61+
let mut flash_pages = 0;
62+
for i in 0..(TABLE_SIZE - 1) {
63+
let mapping = unsafe { mmu_table_ptr.add(i).read_volatile() };
64+
if mapping & (ENTRY_INVALID | ENTRY_TYPE) == ENTRY_VALID | ENTRY_ACCESS_FLASH {
65+
if mapping & ENTRY_VALID_VAL_MASK == 0 {
66+
if page0_seen {
67+
continue;
68+
}
69+
page0_seen = true;
70+
}
71+
flash_pages += 1;
72+
}
73+
}
74+
flash_pages
75+
}
76+
77+
#[procmacros::ram]
78+
unsafe fn move_flash_to_psram_with_spare(
79+
target_entry: usize,
80+
psram_page: usize,
81+
spare_entry: usize,
82+
) {
83+
let mmu_table_ptr = DR_REG_MMU_TABLE as *mut u32;
84+
let target_entry_addr = DBUS_VADDR_BASE + (target_entry * PAGE_SIZE) as u32;
85+
let spare_entry_addr = DBUS_VADDR_BASE + (spare_entry * PAGE_SIZE) as u32;
86+
unsafe {
87+
mmu_table_ptr
88+
.add(spare_entry)
89+
.write_volatile(psram_page as u32 | ENTRY_ACCESS_SPIRAM);
90+
Cache_Invalidate_Addr(spare_entry_addr, PAGE_SIZE as u32);
91+
core::ptr::copy_nonoverlapping(
92+
target_entry_addr as *const u8,
93+
spare_entry_addr as *mut u8,
94+
PAGE_SIZE,
95+
);
96+
rom_Cache_WriteBack_Addr(spare_entry_addr, PAGE_SIZE as u32);
97+
mmu_table_ptr
98+
.add(target_entry)
99+
.write_volatile(psram_page as u32 | ENTRY_ACCESS_SPIRAM);
100+
}
101+
}
102+
103+
/// Copy flash-mapped pages to PSRAM, copying flash-page 0 only once, and re-map
104+
/// those pages to the PSRAM copies
105+
#[procmacros::ram]
106+
pub(super) unsafe fn copy_flash_to_psram_and_remap(free_page: usize) -> usize {
107+
let mmu_table_ptr = DR_REG_MMU_TABLE as *mut u32;
108+
109+
const SPARE_PAGE: usize = TABLE_SIZE - 1;
110+
const SPARE_PAGE_DCACHE_ADDR: u32 = DBUS_VADDR_BASE + (SPARE_PAGE * PAGE_SIZE) as u32;
111+
112+
let spare_page_mapping = unsafe { mmu_table_ptr.add(SPARE_PAGE).read_volatile() };
113+
let mut page0_page = None;
114+
let mut psram_page = free_page;
115+
116+
unsafe { Cache_WriteBack_All() };
117+
for i in 0..(TABLE_SIZE - 1) {
118+
let mapping = unsafe { mmu_table_ptr.add(i).read_volatile() };
119+
if mapping & (ENTRY_INVALID | ENTRY_TYPE) != ENTRY_VALID | ENTRY_ACCESS_FLASH {
120+
continue;
121+
}
122+
if mapping & ENTRY_VALID_VAL_MASK == 0 {
123+
match page0_page {
124+
Some(page) => {
125+
unsafe {
126+
mmu_table_ptr
127+
.add(i)
128+
.write_volatile(page as u32 | ENTRY_ACCESS_SPIRAM)
129+
};
130+
continue;
131+
}
132+
None => page0_page = Some(psram_page),
133+
}
134+
}
135+
unsafe { move_flash_to_psram_with_spare(i, psram_page, SPARE_PAGE) };
136+
psram_page += 1;
137+
}
138+
139+
// Restore spare page mapping
140+
unsafe {
141+
mmu_table_ptr
142+
.add(SPARE_PAGE)
143+
.write_volatile(spare_page_mapping);
144+
Cache_Invalidate_Addr(SPARE_PAGE_DCACHE_ADDR, PAGE_SIZE as u32);
145+
}
146+
147+
// Special handling if the spare page was mapped to flash
148+
if spare_page_mapping & (ENTRY_INVALID | ENTRY_TYPE) == ENTRY_VALID | ENTRY_ACCESS_FLASH {
149+
unsafe {
150+
// We're running from ram so using the first page should not cause issues
151+
const SECOND_SPARE: usize = 0;
152+
let second_spare_mapping = mmu_table_ptr.add(SECOND_SPARE).read_volatile();
153+
154+
move_flash_to_psram_with_spare(SPARE_PAGE, psram_page, SECOND_SPARE);
155+
156+
// Restore spare page mapping
157+
mmu_table_ptr.add(0).write_volatile(second_spare_mapping);
158+
Cache_Invalidate_Addr(
159+
DBUS_VADDR_BASE + (SECOND_SPARE * PAGE_SIZE) as u32,
160+
PAGE_SIZE as u32,
161+
);
162+
}
163+
psram_page += 1;
164+
}
165+
psram_page - free_page
166+
}

esp-hal/src/soc/esp32s3/psram.rs

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,13 @@ pub struct PsramConfig {
108108
pub flash_frequency: FlashFreq,
109109
/// Frequency of PSRAM memory
110110
pub ram_frequency: SpiRamFreq,
111+
/// Copy code and read-only data from flash to PSRAM and remap the
112+
/// respective pages to point to PSRAM
113+
///
114+
/// Refer to
115+
/// https://docs.espressif.com/projects/esp-idf/en/stable/esp32s3/api-guides/external-ram.html#execute-in-place-xip-from-psram
116+
/// for more information.
117+
pub execute_from_psram: bool,
111118
}
112119

113120
/// Initialize PSRAM to be used for data.
@@ -124,7 +131,9 @@ pub(crate) fn init_psram(config: PsramConfig) {
124131
const CONFIG_ESP32S3_DATA_CACHE_SIZE: u32 = 0x8000;
125132
const CONFIG_ESP32S3_DCACHE_ASSOCIATED_WAYS: u8 = 8;
126133
const CONFIG_ESP32S3_DATA_CACHE_LINE_SIZE: u8 = 32;
127-
const START_PAGE: u32 = 0;
134+
135+
let mut free_page = 0;
136+
let mut psram_size = config.size.get();
128137

129138
extern "C" {
130139
fn rom_config_instruction_cache_mode(
@@ -144,8 +153,23 @@ pub(crate) fn init_psram(config: PsramConfig) {
144153
fn Cache_Resume_DCache(param: u32);
145154
}
146155

147-
let start = unsafe {
156+
// Vaguely based off of the ESP-IDF equivalent code:
157+
// https://github.com/espressif/esp-idf/blob/3c99557eeea4e0945e77aabac672fbef52294d54/components/esp_psram/mmu_psram_flash.c#L46-L134
158+
if config.execute_from_psram {
159+
let flash_pages = mmu::count_effective_flash_pages();
160+
let psram_pages = psram_size / mmu::PAGE_SIZE;
161+
162+
if flash_pages > psram_pages {
163+
panic!("Cannot execute from PSRAM: The number of PSRAM pages ({}) is too small to fit {} flash pages", psram_pages, flash_pages);
164+
}
148165

166+
let psram_pages_used = unsafe { mmu::copy_flash_to_psram_and_remap(free_page) };
167+
168+
free_page += psram_pages_used;
169+
psram_size -= psram_pages_used * mmu::PAGE_SIZE;
170+
}
171+
172+
let start = unsafe {
149173
// calculate the PSRAM start address to map
150174
// the linker scripts can produce a gap between mapped IROM and DROM segments
151175
// bigger than a flash page - i.e. we will see an unmapped memory slot
@@ -177,9 +201,9 @@ pub(crate) fn init_psram(config: PsramConfig) {
177201
if mmu::cache_dbus_mmu_set(
178202
mmu::ENTRY_ACCESS_SPIRAM,
179203
start,
180-
START_PAGE << 16,
204+
(free_page as u32) << 16,
181205
64,
182-
config.size.get() as u32 / 1024 / 64, // number of pages to map
206+
(psram_size / mmu::PAGE_SIZE) as u32, // number of pages to map
183207
0,
184208
) != 0
185209
{

0 commit comments

Comments
 (0)