Surprisingly large function call overhead cost #14007
Replies: 2 comments 2 replies
-
The slow-down could be due to the loading of the global variable Try this modification to the slow code: def copy_to_uart():
... (same as before)
def main():
copy = copy_to_uart # pre-load global variable into local variable
while True:
copy() |
Beta Was this translation helpful? Give feedback.
-
Thanks for the follow-up, @dpgeorge. The code shown in my question is just a snippet from the real code (to which I provided this link). As noted, I'd already found that globals have a very high cost, so had already eliminated them (I believe) by wrapping everything in an apparently redundant PS thank you for the amazing tool that is MicroPython. I think, when showered with questions like this one, where it's always "why this?" / "why that?", it may sound like we're all always complaining. If so, apologies. I really appreciate MicroPython and am trying to get the most out of it 👍 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a piece of code where introducing a function call incurs a surprising overhead. Essentially, this first bit of code is far slower than the second version.
Slow code:
Fast code:
I.e. allowing the code to regularly drop out of the function
copy_to_uart
(only to be immediately re-invoked by the while-loop) has a noticeable effect on how quickly I can echo data through the UART.Of course, I understand that invoking a function incurs some cost but I'm surprised at the effect - the bytes-per-second (when I try to max things out) drops by about 14%.
Given that there are only three other calls involved, i.e.
ipoll
,readinto
andwrite
, this might not seem too bad. But actually, these get called many times due to thefor
loop (when one is trying to stream data through the UART as fast as possible). And actually, in my real code, I make many more calls in thecopy_to_uart
logic (and check all the return values). However, all these calls are into the standard MicroPython libraries.Is the function call overhead really so much greater for my own code compared to the standard libraries?
Note: I'm using a C3 ESP32 and I see that while there's a native emitter for the Xtensa based ESP32s (see
py/asmxtensa.c
), there's no native emitter for the RISC-V based C3 ESP32s.I've tried all kind of things suggested on the MicroPython docs on optimizations and maximizing speed. But none of them, e.g. caching object references, really had a noticeable impact (I'd already discovered that accessing global variables is expensive and so avoid that in my code).
If you want to experiment, you can find a tiny but complete example that includes the above snippet here. To go with it, there's
serial_tester.py
- you can run it on your laptop/PC like so:It depends on pyserial, which you'll already have installed if you're using
mpremote
.As this code involves tying up UART0, I'd also suggest adding this block before the
run
call if you want to be able to iterate on uploading code (just press the boards RESET button and code waits 3 seconds before taking control of UART0, during which time you can upload a new program or connect to the REPL and press ctrl-C):Beta Was this translation helpful? Give feedback.
All reactions