Skip to content

Commit 4697f61

Browse files
committed
ESP32-S3: Support execute in place from PSRAM
This implementation mirrors how the ESP-IDF implementation of this feature (which is based on the `Cache_Flash_To_SPIRAM_Copy` rom function) works except it differs in a few key ways: The ESP-IDF seems to map `.text` and `.rodata` into the first and second 128 cache pages respectively (although looking at the linker scripts, I'm not sure how, but a runtime check confirmed this seemed to be the case). This is reflected in how the `Cache_Count_Flash_Pages`, `Cache_Flash_To_SPIRAM_Copy` rom functions and the ESP-IDF code executing them works. The count function can only be made to count flash pages within the first 256 pages (of which there are 512 on the ESP32-S3). Likewise, the copy function will only copy flash pages which are mapped within the first 256 entries (across two calls). As the esp-hal handles mapping `.text` and `.rodata` differently, these ROM functions are technically not appropriate if more than 256 pages of flash (`.text` and `.rodata` combined) are in use by the application. Additionally, the functions both contain bugs, one of which the IDF attempts to work around incorrectly, and the other which the IDF does not appear to be aware of. Details of these bugs can be found on the IDF issue/PR tracker[0][1]. As a result, this commit contains a heavily modified/adjusted rust re-write of the reverse engineered ROM code combined with a vague port of the ESP-IDF code. There are three additional noteworthy differences from the ESP-IDF version of the code: 1. The ESP-IDF allows the `.text` and `.rodata` segments to be mapped independently and separately allowing only one to be mapped. But the current version of the code does not allow this flexibility. This can be implemented by checking the address of each page entry against the segment locations to determine which segment each address belongs to. 2. The ESP-IDF calls `cache_ll_l1_enable_bus(..., cache_ll_l1_get_bus(..., SOC_EXTRAM_DATA_HIGH, 0));` (functions from the ESP-IDF) in order to "Enable the most high bus, which is used for copying FLASH `.text` to PSRAM" but on the ESP32-S3 after careful inspection these calls result in a no-op as the address passed to cache_ll_l1_get_bus will result in an empty cache bus mask. It's currently unclear to me if this is a bug in the ESP-IDF code, or if this code (which from cursory investigation is probably not a no-op on the -S2) is solely targetting the ESP32-S3. 3. The ESP-IDF calls `Cache_Flash_To_SPIRAM_Copy` with an icache address when copying `.text` and a dcache address when copying `.rodata`. This affects which cache the reads will occur through. But the writes always go through a "spare page" (name I came up with during reverse engineering) via the dcache. This code performs all reads through the dcache. I don't know if there's a proper reason to read through the correct cache when doing the copy and this doesn't appear to have any negative impact. [0]: espressif/esp-idf#15262 [1]: espressif/esp-idf#15263
1 parent f247b40 commit 4697f61

File tree

2 files changed

+126
-9
lines changed

2 files changed

+126
-9
lines changed

esp-hal/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1010
### Added
1111
- SPI: Added support for 3-wire SPI (#2919)
1212
- Add separate config for Rx and Tx (UART) #2965
13+
- ESP32-S3: Support execute in place from PSRAM
1314

1415
### Changed
1516

esp-hal/src/soc/esp32s3/psram.rs

Lines changed: 125 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,8 @@ pub struct PsramConfig {
109109
pub flash_frequency: FlashFreq,
110110
/// Frequency of PSRAM memory
111111
pub ram_frequency: SpiRamFreq,
112+
/// Copy Flash to PSRAM
113+
pub copy_flash: bool,
112114
}
113115

114116
/// Initialize PSRAM to be used for data.
@@ -119,14 +121,21 @@ pub(crate) fn init_psram(config: PsramConfig) {
119121
let mut config = config;
120122
utils::psram_init(&mut config);
121123

124+
const MMU_PAGE_SIZE: u32 = 0x10000;
122125
const CONFIG_ESP32S3_INSTRUCTION_CACHE_SIZE: u32 = 0x4000;
123126
const CONFIG_ESP32S3_ICACHE_ASSOCIATED_WAYS: u8 = 8;
124127
const CONFIG_ESP32S3_INSTRUCTION_CACHE_LINE_SIZE: u8 = 32;
125128
const CONFIG_ESP32S3_DATA_CACHE_SIZE: u32 = 0x8000;
126129
const CONFIG_ESP32S3_DCACHE_ASSOCIATED_WAYS: u8 = 8;
127130
const CONFIG_ESP32S3_DATA_CACHE_LINE_SIZE: u8 = 32;
131+
const MMU_INVALID: u32 = 1 << 14;
128132
const MMU_ACCESS_SPIRAM: u32 = 1 << 15;
129-
const START_PAGE: u32 = 0;
133+
const ICACHE_MMU_SIZE: usize = 0x800;
134+
const FLASH_MMU_TABLE_SIZE: usize = ICACHE_MMU_SIZE / core::mem::size_of::<u32>();
135+
const DR_REG_MMU_TABLE: u32 = 0x600C5000;
136+
137+
let mut free_page = 0;
138+
let mut psram_size = config.size.get();
130139

131140
extern "C" {
132141
fn rom_config_instruction_cache_mode(
@@ -161,14 +170,122 @@ pub(crate) fn init_psram(config: PsramConfig) {
161170
num: u32,
162171
fixed: u32,
163172
) -> i32;
173+
174+
fn Cache_WriteBack_All();
175+
fn Cache_Invalidate_Addr(addr: u32, size: u32);
176+
fn rom_Cache_WriteBack_Addr(addr: u32, size: u32);
177+
}
178+
179+
// Vaguely based off of the ESP-IDF equivalent code:
180+
// https://github.com/espressif/esp-idf/blob/3c99557eeea4e0945e77aabac672fbef52294d54/components/esp_psram/mmu_psram_flash.c#L46-L134
181+
if config.copy_flash {
182+
const MMU_VALID: u32 = 0;
183+
const MMU_TYPE: u32 = 1 << 15;
184+
const MMU_ACCESS_FLASH: u32 = 0;
185+
const MMU_VALID_VAL_MASK: u32 = 0x3fff;
186+
const MMU_DBUS_VADDR_BASE: u32 = 0x3C000000;
187+
const SPARE_PAGE: usize = FLASH_MMU_TABLE_SIZE - 1;
188+
const SPARE_PAGE_DCACHE_ADDR: u32 = MMU_DBUS_VADDR_BASE + SPARE_PAGE as u32 * MMU_PAGE_SIZE;
189+
190+
let mmu_table_ptr = DR_REG_MMU_TABLE as *mut u32;
191+
192+
unsafe fn move_flash_to_psram_with_spare(mmu_table_ptr: *mut u32, target_entry: usize, psram_page: u32, spare_entry: usize) {
193+
let target_entry_addr = MMU_DBUS_VADDR_BASE + target_entry as u32 * MMU_PAGE_SIZE;
194+
let spare_entry_addr = MMU_DBUS_VADDR_BASE + spare_entry as u32 * MMU_PAGE_SIZE;
195+
unsafe {
196+
mmu_table_ptr
197+
.add(spare_entry)
198+
.write_volatile(psram_page | MMU_ACCESS_SPIRAM);
199+
Cache_Invalidate_Addr(spare_entry_addr, MMU_PAGE_SIZE);
200+
core::ptr::copy_nonoverlapping(
201+
target_entry_addr as *const u8,
202+
spare_entry_addr as *mut u8,
203+
MMU_PAGE_SIZE as usize,
204+
);
205+
rom_Cache_WriteBack_Addr(spare_entry_addr, MMU_PAGE_SIZE);
206+
mmu_table_ptr
207+
.add(target_entry)
208+
.write_volatile(psram_page | MMU_ACCESS_SPIRAM);
209+
}
210+
}
211+
212+
let spare_page_mapping = unsafe { mmu_table_ptr.add(SPARE_PAGE).read_volatile() };
213+
214+
// All entries mapping flash page 0 will be mapped to the same page later so are only
215+
// counted once
216+
let mut page0_seen = false;
217+
let mut flash_pages = 0;
218+
for i in 0..(FLASH_MMU_TABLE_SIZE - 1) {
219+
let mapping = unsafe { mmu_table_ptr.add(i).read_volatile() };
220+
if mapping & (MMU_INVALID | MMU_TYPE) == MMU_VALID | MMU_ACCESS_FLASH {
221+
if mapping & MMU_VALID_VAL_MASK == 0 {
222+
if page0_seen {
223+
continue;
224+
}
225+
page0_seen = true;
226+
}
227+
flash_pages += 1;
228+
}
229+
}
230+
231+
if flash_pages > (psram_size / MMU_PAGE_SIZE as usize) as u32 {
232+
panic!("PSRAM is too small to fit a copy of flash");
233+
}
234+
235+
let mut page0_page = None;
236+
237+
unsafe { Cache_WriteBack_All() };
238+
for i in 0..(FLASH_MMU_TABLE_SIZE - 1) {
239+
let mapping = unsafe { mmu_table_ptr.add(i).read_volatile() };
240+
if mapping & (MMU_INVALID | MMU_TYPE) != MMU_VALID | MMU_ACCESS_FLASH {
241+
continue;
242+
}
243+
if mapping & MMU_VALID_VAL_MASK == 0 {
244+
match page0_page {
245+
Some(page) => {
246+
unsafe {
247+
mmu_table_ptr
248+
.add(i)
249+
.write_volatile(page | MMU_ACCESS_SPIRAM)
250+
};
251+
continue;
252+
}
253+
None => page0_page = Some(free_page),
254+
}
255+
}
256+
unsafe { move_flash_to_psram_with_spare(mmu_table_ptr, i, free_page, SPARE_PAGE) };
257+
free_page += 1;
258+
}
259+
260+
// Restore spare page mapping
261+
unsafe {
262+
mmu_table_ptr
263+
.add(SPARE_PAGE)
264+
.write_volatile(spare_page_mapping);
265+
Cache_Invalidate_Addr(SPARE_PAGE_DCACHE_ADDR, MMU_PAGE_SIZE);
266+
}
267+
268+
// Special handling if the spare page was mapped to flash
269+
if spare_page_mapping & (MMU_INVALID | MMU_TYPE) == MMU_VALID | MMU_ACCESS_FLASH {
270+
unsafe {
271+
// We're running from ram so using the first page should not cause issues
272+
const SECOND_SPARE: usize = 0;
273+
let second_spare_mapping = mmu_table_ptr.add(SECOND_SPARE).read_volatile();
274+
275+
move_flash_to_psram_with_spare(mmu_table_ptr, SPARE_PAGE, free_page, SECOND_SPARE);
276+
277+
// Restore spare page mapping
278+
mmu_table_ptr.add(0).write_volatile(second_spare_mapping);
279+
Cache_Invalidate_Addr(MMU_DBUS_VADDR_BASE + SECOND_SPARE as u32 * MMU_PAGE_SIZE, MMU_PAGE_SIZE);
280+
}
281+
free_page += 1;
282+
}
283+
284+
psram_size -= free_page as usize * MMU_PAGE_SIZE as usize;
164285
}
165286

166287
let start = unsafe {
167-
const MMU_PAGE_SIZE: u32 = 0x10000;
168-
const ICACHE_MMU_SIZE: usize = 0x800;
169-
const FLASH_MMU_TABLE_SIZE: usize = ICACHE_MMU_SIZE / core::mem::size_of::<u32>();
170-
const MMU_INVALID: u32 = 1 << 14;
171-
const DR_REG_MMU_TABLE: u32 = 0x600C5000;
288+
let mmu_table_ptr = DR_REG_MMU_TABLE as *const u32;
172289

173290
// calculate the PSRAM start address to map
174291
// the linker scripts can produce a gap between mapped IROM and DROM segments
@@ -177,7 +294,6 @@ pub(crate) fn init_psram(config: PsramConfig) {
177294
//
178295
// More general information about the MMU can be found here:
179296
// https://docs.espressif.com/projects/esp-idf/en/stable/esp32s3/api-reference/system/mm.html#introduction
180-
let mmu_table_ptr = DR_REG_MMU_TABLE as *const u32;
181297
let mut mapped_pages = 0;
182298
for i in (0..FLASH_MMU_TABLE_SIZE).rev() {
183299
if mmu_table_ptr.add(i).read_volatile() != MMU_INVALID {
@@ -208,9 +324,9 @@ pub(crate) fn init_psram(config: PsramConfig) {
208324
if cache_dbus_mmu_set(
209325
MMU_ACCESS_SPIRAM,
210326
start,
211-
START_PAGE << 16,
327+
free_page << 16,
212328
64,
213-
config.size.get() as u32 / 1024 / 64, // number of pages to map
329+
(psram_size / MMU_PAGE_SIZE as usize) as u32, // number of pages to map
214330
0,
215331
) != 0
216332
{

0 commit comments

Comments
 (0)