-
|
Hi there, I found your port when I was playing around with porting peanut to ESPHome for a PoC. When there is silence in the sound. It makes a bit of an engine rumbling sound. Probably something wrong with my buffer code, but at the least I can make out the audio when it does start playing! Which was better than what was happening in Peanut. Excuse the monochrome 128x64 squashed cursed screen it was the only thing I had hooked up at the time. esphomeboygbcemu.mp4I was wondering if it would be possible to add something like an active memory inspector like how cheat engines work. Where you can search for memory values. I could then use this data to trigger events in ESPHome/HA. I will show you an example of what I did with another game called CastleBoy. When you whip the candles/torches in the game, I have an event handler that looks for that action, then sends a command to Home Assistant to turn out the lights. You can see the video here of it: https://community.arduboy.com/t/castleboy-castlevania-demake/3011/60 What I was hoping for was that I could monitor the memory for something happening in the gameboy game and then fire off an action in ESPHome. Basically a fun AR like exp toy. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
|
Regarding audio, your symptoms match buffer underruns. In my esp32-s3 implementation I track how many I2S buffers are currently queued and only submit a new one when there’s room. Audio runs entirely on core 0 of the ESP32-S3(while emulation is on core 1) and should play without distortion if buffering stays ahead. For an active memory inspector / Cheat Engine–style workflow, you can directly inspect the internal gb_s memory regions (WRAM, VRAM, OAM, HRAM). Below is a simplified example showing how to locate a value across memory regions: Once you identify a stable address(or addresses), you can:
For automation use-cases, it’s generally best to run memory inspections at frame boundaries rather than continuously. How you evaluate the data (equality checks, transitions, counters, or bitfields) will depend on the specific game state or trigger you’re interested in. If you have a specific in-game action you’re trying to detect (e.g. an animation state, flag, or counter) I'd look at RetroAchievements in particular. It exposes achievement logic that’s effectively a set of memory watches and comparisons, which maps very well to this kind of automation use-case. Sounds like a fun project — feel free to share progress or results as it develops. Edit: These functions are slower than directly inspecting the internal memory arrays, but they allow you to examine:
Logical memory readsLogical memory writesIf your platform is not compatible with 16-bit or 32-bit DMA, these wider write helpers may also be incompatible. In that case, you can fall back to multiple calls to __gb_write() to achieve the same effect. |
Beta Was this translation helpful? Give feedback.
-
|
Marking answered/resolved pending any further information on your requirements. Feel free to expand on your question if you need further information and I'll reopen the topic. |
Beta Was this translation helpful? Give feedback.
-
|
Sorry, I project shifted, will report back when I have a chance to try what you said. Thank you very much for that run down, looks like good information, I talked with the RetroArchievement discord from your recommendation above which was a great help! They pointed me to the right spots to get all the memory locations. I will report on that as well and share it here when I have a moment to poke at the code and test things out. I put the cores to the orders you said and it ran a bit better on the ESP32-P4, but still some unusual sluggishness. You would think the P4 wouldn't blink an eye to this. Maybe there are some interesting optimizations the P4 requires that is a little different from an S3. I also added some P4 related compiling flags which seemed to help as well. Maybe there is more hidden features there to take advantage of. build_flags:
- "-O3"
- "-ffast-math"
- "-DBOARD_HAS_PSRAM"
- "-ftree-vectorize"
- "-fno-exceptions"#pragma once
// ESP32-P4 optimization: Place ROM data in cache-friendly memory
#ifdef ESP_PLATFORM
#define ROM_DATA_ATTR __attribute__((section(".rodata"))) __attribute__((aligned(16)))
#else
#define ROM_DATA_ATTR
#endifvoid GBEmulator::draw(display::Display &it) {
if (!framebuffer_) return;
if (xSemaphoreTake(framebuffer_mutex, pdMS_TO_TICKS(5)) != pdTRUE) return;
int center_x = 80, center_y = 72;
int view_w = (int)(160 / zoom_), view_h = (int)(144 / zoom_);
int src_x_start = center_x - view_w / 2, src_y_start = center_y - view_h / 2;
if (src_x_start < 0) src_x_start = 0;
if (src_y_start < 0) src_y_start = 0;
if (src_x_start + view_w > 160) src_x_start = 160 - view_w;
if (src_y_start + view_h > 144) src_y_start = 144 - view_h;
// SIMD-optimized rendering: process 4 pixels at a time
for (int y = 0; y < 64; y++) {
int src_y = src_y_start + (y * view_h) / 64;
if (src_y >= 144) src_y = 143;
uint8_t *line = &framebuffer_[src_y * 160];
int x = 0;
// Process 4 pixels per iteration for vectorization
for (; x < 124; x += 4) {
for (int i = 0; i < 4; i++) {
int src_x = src_x_start + ((x + i) * view_w) / 128;
if (src_x < 160 && (line[src_x] & 0x03) < 2) {
it.draw_pixel_at(x + i, y, display::COLOR_ON);
}
}
}
// Handle remaining pixels
for (; x < 128; x++) {
int src_x = src_x_start + (x * view_w) / 128;
if (src_x < 160 && (line[src_x] & 0x03) < 2) {
it.draw_pixel_at(x, y, display::COLOR_ON);
}
}
}
xSemaphoreGive(framebuffer_mutex);
}Turned the I2C bus up to 1MHz as well. I should probably get a better screen to test on. |
Beta Was this translation helpful? Give feedback.
-
|
Apologies on my end as well for the delay — I’ve been unusually busy this month. On the ESP32-S3FN8, one thing that helped noticeably was adding The I2C display is very likely a major contributor to the sluggishness you’re seeing. In my setup, I’m using an 80 MHz SPI display. A 40 MHz SPI connection was usable, but switching to 80 MHz provided a clear improvement of several FPS in my implementation. On the ESP32-P4 specifically, a MIPI-DSI panel would be ideal, but even a small SPI screen should give you much more headroom than I2C. One important difference in my renderer is that I never draw individual pixels. Per-pixel drawing dramatically increases display communication overhead. Instead, I batch entire lines, with optional scaling applied by the video thread into a temporary line buffer. The transfer then begins using SPI DMA, with a tight loop that waits for DMA completion while repeatedly yielding to other tasks in between checks. Thread-wise, I dedicate one core exclusively to emulation. On the second core, I run both audio and video threads. The video thread only outputs lines that were rendered, using a per-line LUT. This gives a pseudo–double-buffering effect without the full memory footprint, since new lines can be rendered as soon as previously completed ones are pushed to the display. You likely have significantly more memory available than I did — I was working within 512 KB of SRAM, with no PSRAM, using a real-time paging algorithm that adds extra logic to every ROM access. Given that, your final performance should be substantially better, though it may take some time to balance the audio and video thread priorities. I use SPI DMA output on the video thread and yield the task until completion, which allows audio to stay responsive. It can take a bit of tuning to get thread priorities just right, and if you end up pushing a core close to 100%, it may be worth disabling watchdog timers if they’re enabled. Doing so gave me a small but measurable performance gain and prevented resets when the emulation core was fully saturated — something that may or may not apply on the ESP32-P4, depending on how aggressively it schedules. I’d definitely be interested to hear what ends up helping the most on your end, especially if you run into any P4-specific quirks or optimizations. Feel free to share results or project details once you’ve had time to poke at things — I’m curious how far the P4 can be pushed. I’m also curious about the RetroAchievements trigger setup you’re using. If you end up with something particularly clean or clever and want to share it, I’m always open to adding small helper functions to the |
Beta Was this translation helpful? Give feedback.
Regarding audio, your symptoms match buffer underruns. In my esp32-s3 implementation I track how many I2S buffers are currently queued and only submit a new one when there’s room. Audio runs entirely on core 0 of the ESP32-S3(while emulation is on core 1) and should play without distortion if buffering stays ahead.
For an active memory inspector / Cheat Engine–style workflow, you can directly inspect the internal gb_s memory regions (WRAM, VRAM, OAM, HRAM). Below is a simplified example showing how to locate a value across memory regions: