Replies: 5 comments 5 replies
-
Just a quick question -- is the issue here mostly the limited amount of IRAM? i.e. do you really need a JIT to solve your problem, rather just a more effective way to be able to use For example, if the native emitter continued to work as it does right now (i.e. during the compile phase, not at runtime/JIT), but stored in regular RAM and copied to IRAM as the function was executed (perhaps based on some sort of LRU policy). This is more complicated than it sounds because you'll likely need to do address relocations etc (classic linker/loader stuff) but compared to implementing a JIT it's a lot simpler. Also wouldn't require making huge modifications to the VM and compiler. The other thing to consider is that the ESP32 can execute code from flash. So making the native emitter able to emit to an XIP flash region might get what you want too, without needing to worry about relocations (although you want to be more careful about how this is updated w.r.t. wear levelling etc). See also #8381 -- this doesn't support native code, but I suspect making native code work with this might be easier than implementing a JIT! |
Beta Was this translation helpful? Give feedback.
-
Hi @jimmo Yes, lack of IRAM was a pretty big inspiration for this but not the only one. I can't quite squeeze as much performance out of MicroPython as I wanted, and a JIT would give pretty large performance benefits, at the cost of significant RAM use (I have PSRAM). I have compared the speed of an ESP32 revision 1 without PSRAM and an ESP32 rev 3 with PSRAM and with specialized builds the difference in executed code speed are not significant. I could allocate a buffer in IRAM and add a trap for the InstrFetchProhibited exception, check if the accessed address is currently cached in DRAM/PSRAM, and then copy the text if it exists. This could potentially be abused with intentional instruction fetches in DRAM addresses, so it would be possible to directly execute a payload if it was formatted correctly. Or I could keep a "page table" with a list of current functions and which ones are currently swapped in to IRAM (and their offset addresses). This would remove the need for an exception handler and resuming execution with the previous register context. As for allocating flash pages for execution. This would be the best bet for ensuring the most code would fit into an executable region, but there are design reasons why I'm trying to avoid excessive flash writes. I probably should have explained this before, but I chose MicroPython for my console due to its beginner friendliness, its VFS capability, and its ability to execute bytecode from RAM. I froze the modules that are required for booting into the MicroPython binary, and then the rest of the libraries that can be updated on the field are stored in the VFS partition of Flash. All files are stored pre-compiled on the filesystem and most of them reside in encrypted virtual disks (like most of my system libraries are stored in a Also, (if I decided to JIT all of the code while the system was running) if I were to inject a I think for now I'll write a code analysis tool that picks up long for loops, potentially relocates them and marks them as native or viper code depending on how hard they are to optimise (and get type information). Then I'll test it out with the code swapping and see how much of a performance gain I get. |
Beta Was this translation helpful? Give feedback.
-
Before you attempt to JIT I would recommend looking at the existing AOT (ahead of time) compiler (native/viper) and try to measure if it would give you enough of a performance boost. Also definitely look at #8381 (essentially dynamic freezing of code), that might be enough to alleviate your iRAM issues. |
Beta Was this translation helpful? Give feedback.
-
FWIW, would you consider using an ESP32-S2 or S3 in your design, which allows putting executable sections into PSRAM? |
Beta Was this translation helpful? Give feedback.
-
I was also thinking about the gaming abilities of micropython. Putting a few more thoughts into algorithms allows to write things like fast triangle routines and the like. The following demo runs on stock micropython meaning that the video driver and the 3d routines are written in micropython. The result is not that bad: https://youtube.com/shorts/EcZD9xHFwBc However, you e.g. see some stuttering every few seconds due to garbage collection. Calling garbage collection e.g. every frame makes things smoother but also much slower. GC is imho the major problem with gaming. If performance is an issue in general you can of course always move stuff to the native side. For a gaming platform you could e.g. include video and audio drivers and things like sprite engines natively. This may result in a pretty fast gaming setup leaving only the code of the specific game on the python side. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am aware that JIT was previously mentioned in #4085, but I'm willing to work on this, so please hear me out.
I'm in the process of building a game console that utilizes MicroPython. I started over a year ago and I'm getting close to done but I have a few issues. I've been puzzling for a while how to make MicroPython faster. I know I can write C modules for certain functionality and APIs, but game developers only have access to natmod, and the viper and native decorators. I could be wrong, but I noticed that the code compiled by the viper and native decorators sticks around long after its needed. Also, I can only fit small amounts of code into IRAM. So when using large amounts of code that needs to run natively and at the same time, you run out of RAM.
So, after a few weeks of thought (and a lot of time spent playing on emulated consoles), I finally got an idea: a JIT Compiler. It will definitely increase memory usage and code size but it would result in significant speed boosts in certain cases.
So, if I'm remembering correctly, a JIT consists of the following components:
This looks like a pretty complex system (and I'm sure the current compiler takes an input file to compile and not just a function but I could be wrong) and it would take a lot of time and effort to make a decent JIT.
I'm willing to do most if not all of the work on this. If @dpgeorge you have any suggestions please let me know. And if this JIT turns out to work pretty well, then I'll submit a PR to merge this into MicroPython as an optional component because I'm sure most people don't need a JIT and it will take a lot of memory (especially IRAM).
Beta Was this translation helpful? Give feedback.
All reactions