Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 18 additions & 17 deletions _posts/2025-02-20-why-sleep-for-is-broken-on-esp32.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@ performance, but only in the new version of IDF due to an interaction between
<!-- excerpt end -->

> 🎬
> _[Listen to Steve on Interrupt Live](https://youtube.com/live/dwL-PI7TuDY?feature=share)
> talk about the content and motivations behind writing this article._
> [Listen to Steve on Interrupt Live](https://youtube.com/live/dwL-PI7TuDY?feature=share)
> at **9AM PT | 12PM ET | 6PM CET on Friday, March 7th** talk about the content
> and motivations behind writing this article.

{% include newsletter.html %}

Expand Down Expand Up @@ -160,7 +161,7 @@ hardware timer to do it.
All of this is to say the following:

1. If you care about precision timing, details matter.
2. If you *don't care* about precision timing but are using it, you are probably
2. If you _don't care_ about precision timing but are using it, you are probably
wasting resources.

## C++: the Solution to and Cause of All My Problems
Expand Down Expand Up @@ -447,9 +448,9 @@ use `usleep()` instead? Well, `usleep()` is just a wrapper to
consider `usleep()` the "public" exposure of `esp_rom_delay_us()`, but only when
the specified time is less than a system tick period. As mentioned above, this
is a busy wait, and since it does not disable the scheduler, it still allows
other threads *of equal or higher priority* to run. So, the timing represents a
guaranteed *minimum* only. More importantly, if there are other threads of lower
priority, it will *not* context switch during this busy time. It will just sit
other threads _of equal or higher priority_ to run. So, the timing represents a
guaranteed _minimum_ only. More importantly, if there are other threads of lower
priority, it will _not_ context switch during this busy time. It will just sit
in the thread until the wait is over.

This is all good. A guaranteed minimum is how I expect `usleep()` to work.
Expand All @@ -463,11 +464,11 @@ say:
> longer than one FreeRTOS tick period. If the time is shorter, the thread will
> busy-wait instead of yielding to another RTOS task.

It should say for sleeping *equal to* or longer than one tick period cause
It should say for sleeping _equal to_ or longer than one tick period cause
yielding vs. busy waiting. In any case, the yielding is done via `vTaskDelay()`.

There is a problem here though. The ticks to yield calculations often produce
times yielded *less than* the specified amount.
times yielded _less than_ the specified amount.

Let's play out an example. If we wanted to sleep for 15 milliseconds, the
calculations would give us `vTaskDelay(2)`:
Expand All @@ -488,7 +489,7 @@ Even though the comment says it is rounding up to compensate for the first tick
potentially not blocking at all, the compensation does not account for the
worse-case minimal timing. In the example I gave, a 15-millisecond request will
sometimes only sleep for 10 milliseconds. Likewise, a 10 millisecond `usleep()`
will sometimes sleep about 0 milliseconds. The greatest *potential* differential
will sometimes sleep about 0 milliseconds. The greatest _potential_ differential
comes with calling `usleep()` with a multiple of the tick period. In that case,
the time spent may be short by an entire tick period.

Expand All @@ -500,7 +501,7 @@ a problem.

According to `man 3 sleep` and
[POSIX](https://pubs.opengroup.org/onlinepubs/009695399/functions/usleep.html),
`usleep()` should always sleep *at least* the time specified. It is allowed to
`usleep()` should always sleep _at least_ the time specified. It is allowed to
sleep more if needed.

> The usleep() function shall cause the calling thread to be suspended from
Expand All @@ -523,7 +524,7 @@ the thread will sleep between 0 and 10 milliseconds. It will usually sleep for
less than the time specified.

In IDF v5, calling `std::this_thread::sleep_for(10ms)` almost always calls
`usleep()` *twice*. The first time will use `vTaskDelay(1)`, and it will usually
`usleep()` _twice_. The first time will use `vTaskDelay(1)`, and it will usually
sleep for less than the time specified. Then, back in `libstdc++`
`__sleep_for()`, the monotonic clock will be checked and it will be seen that
some fractional component of 10 milliseconds remains, causing a second call to
Expand All @@ -547,20 +548,20 @@ system tick period, so the blocking `esp_rom_delay_us()` is now called.

So what about time slicing? Even if `esp_rom_delay_us()` blocks, the FreeRTOS
scheduler can switch to another task. Firstly, if this thread is of a higher
priority, *no lower priorities will ever run*. But even if everything is of the
priority, _no lower priorities will ever run_. But even if everything is of the
same priority, the CPU will just switch back to the blocking call on the next
round robin, continuing the blocking wait. In our current scenario, this is
horribly inefficient, unnecessary, and unexpected.

Any call to `sleep_for()` greater than the tick period has this problem because
the tick interrupt is asynchronous to the `sleep_for()` call. This means when
the scheduler returns from `vTaskDelay()` some random remainder of time will be
done with `esp_rom_delay_us()` in order to sleep for the *precise* amount of
done with `esp_rom_delay_us()` in order to sleep for the _precise_ amount of
time requested.

The new version of `sleep_for()` is much more precise, but it is at the cost of
computing efficiency on the ESP32 because some fraction of the tick period will
be *busy waited* instead of yielded. That is very bad to do on an MCU.
be _busy waited_ instead of yielded. That is very bad to do on an MCU.

Of course, none of this is transparent to the application code, and I doubt it
was something intentional from Espressif. It is just a consequence of upgrading
Expand All @@ -572,7 +573,7 @@ Did Espressif actually implement `usleep()` wrong? Yes. It needs to be fixed.

For periods at or longer than the system tick, `usleep()` can return before the
specified time. It shouldn't do that. It must error on the side of sleeping too
long to ensure it *never* sleeps too little. So yes, it is broken in my view.
long to ensure it _never_ sleeps too little. So yes, it is broken in my view.
`stdlibc++` isn't to blame.

Since `usleep()` is sometimes short by 1 system tick period, we could just add
Expand Down Expand Up @@ -671,7 +672,7 @@ while allowing more control over how to perform the sleep when using C++.

## Conclusion

I cut my teeth on bare metal C code where *everything* was statically allocated.
I cut my teeth on bare metal C code where _everything_ was statically allocated.
No `malloc()`. No floating point math because there was no FPU. Custom linker
scripts. Debugging using GPIO pins and an oscilloscope. Using precalculated
value tables to save a few microseconds in an ISR. We ran at 24 MHz. At that
Expand All @@ -686,7 +687,7 @@ instruction pipelines.
It seems today that using C++ for firmware brings up a lot of strong reactions.
A lot of embedded people hate it. A lot of people love it. For myself, I think
it can be a great tool, but it does have much complexity you need to get right,
*especially* when using it on an MCU. This seems to be a good example of such.
_especially_ when using it on an MCU. This seems to be a good example of such.

I sincerely hope `usleep()` is fixed. Until then, don't use
`std::this_thread::sleep_for()` in your IDF v5 projects. It's a waste of time!
Expand Down