Replies: 5 comments 6 replies
-
OK, so I just tried compiling with USE_SCREEN=0 which improves CPU utilization so that it is down to 3-10% on both cores, if it is running a local effect. When I start streaming over wifi, one of the cores goes up to 25%. The playback is still very choppy, though. After just a few seconds of streaming, the connection is lost and this message in the debug output: (W) (ReadUntilNBytesReceived)(C1) ERROR: -1 bytes read in ReadUntilNBytesReceived trying to read 84 |
Beta Was this translation helpful? Give feedback.
-
We definitely have effects that burn unrealistic amounts of CPU time,
trying to endlessly draw frames, regardless of how fast the hardware can
actually accept it and how quickly the eyeball can perceive changes. We
also have configurations that just burn more CPU than I can explain and are
deserving of some study with a profiler (that's a good project for anyone
looking to get into it...).
I don't think most of the crowd uses configurations with screens, so you're
unique on that one. It's probably worth you diving in to fix that. There
some code in ScreenUpdateLoopEntry() that tries to limit the update rate
and that would be a good start.
The WiFi question probably best goes to @dave Plummer
***@***.***> . I understand he's streaming effects daily and
that's the primary means of update in his big display for his home and he's
certainly not enduring crashes every few seconds. I think he's even on som
eof the M5 products, though I can't recall which ones.
…On Wed, Dec 6, 2023 at 9:56 AM nthrane ***@***.***> wrote:
OK, so I just tried compiling with USE_SCREEN=0 which improves CPU
utilization so that it is down to 3-10% on both cores, if it is running a
local effect.
When I start streaming over wifi, one of the cores goes up to 25%. The
playback is still very choppy, though. After just a few seconds of
streaming, the connection is lost and this message in the debug output:
(W) (ReadUntilNBytesReceived)(C1) ERROR: -1 bytes read in
ReadUntilNBytesReceived trying to read 84
(W)
(W) (ProcessIncomingConnectionsLoop)(C1) Error in getting pixel data from
wifi
(W)
(W) (SocketServerTaskEntry)(C1) Socket connection closed. Retrying...
(W)
[2477282][E][WiFiClient.cpp:422] write(): fail on fd 54, errno: 11, "No
more processes"
—
Reply to this email directly, view it on GitHub
<#561 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCSD35YYVMF6I6BGE2G3J3YICIRVAVCNFSM6AAAAABAIE3BSSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TONZYGAZTA>
.
You are receiving this because you are subscribed to this thread.Message
ID:
<PlummersSoftwareLLC/NightDriverStrip/repo-discussions/561/comments/7778030
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
Hi Niels, For video I have on some builds had to manually set the hardware up to have at least 30 buffers in the build, ie, in globals.h. Hope this helps. |
Beta Was this translation helpful? Give feedback.
-
There are some delay(10) calls inside the Draw of some strip effects.
That's ... not awesome. They should just be letting the parent rate limit
them to the current frame schedule. That would also allow the real
scheduler in FreeRTOS to count that as idle time to swap to the next task,
like filling BUFFERs or explicitly calling taskYIELD().
Just napkin-scratching (and not even being totally sure if you chaps are on
USE_WS281 or USE_HUB75) I'd think that 2 240Mhz cores should be able to
fill boatload of buffers at 20Hz that are being passed to DMACs to pass out
to either case, via FastLED's ESP32 SPI integration or SmartMatrix's RMT
integration that handle the actual hardware babysitting, though stealing
bus contention; we should have half to three quarters of a core doing local
effects (some heavier for SmartMatrix @2kPx) and "the rest" for our
networking and debug chatter and such. I may seriously underestimate what's
really going on and/or how much compute we actually have but I also know
how easy it is to accidentally make a change (in other projects at least)
that cause the system to free-run internally for a long time before it's
noticed.
Profiling embedded apps is a huge pain, but there are success stories
<https://blog.drorgluska.com/2022/12/esp32-performance-profiling.html> with
our approximate configuration. Is Draw() being called faster than our frame
rate? What ARE our highest running paths?
Someone serious about investigating this (please) may best begin at
measuring where we are now. It doesn't SEEM like we should be in as deep of
a hole as are and it's possible that we've taken an update or made a
mistake that's forced something into bit-banging mode or some other crazy
path that "works" but is expensive. (I just checked. Our strip builds
proudly announce "ESP32 Hardware SPI support added" so that's not it, but
it's a relatable example...)
RJL
…On Wed, Dec 6, 2023 at 9:45 PM mikejohnau ***@***.***> wrote:
Hi Niels,
I'm also throwing both effects and video at several different types of the
ESP32 dev board. If you are using the NightDriverServer module, I have
found that in SiteControllers for each location that I had to drop the
FramesPerBuffer and BatchSize down to somewhere between 5 and 10. I notice
that Dave runs his Cabana at 500 frames per buffer, but that kills my
installs totally, Also, in Program.cs I keep the framerate between 24 and
30 for each site.
For video I have on some builds had to manually set the hardware up to
have at least 30 buffers in the build, ie,
#define BUFFERS 30
in globals.h.
Hope this helps.
—
Reply to this email directly, view it on GitHub
<#561 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCSD35SJZKCVQGCA7VAIZDYIE3T3AVCNFSM6AAAAABAIE3BSSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TOOBTGY4TS>
.
You are receiving this because you commented.Message ID:
<PlummersSoftwareLLC/NightDriverStrip/repo-discussions/561/comments/7783699
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
Yep, just from the cuff, it seems like we're doing something wrong
somewhere.
The configuration I'm running ATM is a single 256 strip DEMO target, albeit
on ESP32-S3. Display is great.
(I) (loop)(C1) WiFi: WL_CONNECTED, IP: 192.168.2.13, Mem: 186152,
LargestBlk: 172020, PSRAM Free: 8239411/8380695, LED FPS: 30 LED Bright:
100%, LED Watts: 20, CPU: 000%, 001%, FreeDraw: 0.015
The web GUI shows we're basically sleep-walking our way through this -
rightfully so.
1%
CPUUSED:0.749999285
- CORE0:0.90
- CORE1:0.60
- IDLE:99.25
I think I'm about to call it a night, but if there's some specific
configuration I can build and sanity check (esp. that doesn't require
hardware....mostly we run identically with or without LEDs actually
attached, whether panels or strips.) LMK and I'll spin up a build and see.
I don't have an M5 hardware specifically, but I should be able to get
roughly reproducible results.
20LEDs is 60 LEDS. At 0.00000125 seconds per bit shifting out of the RMT
lines, that's still only 1.8milliseconds to draw that entire chain.
We *should* be sleep walking through that. We're definitely heavier than
FastLED (which we use), esp. if we're generating the animatinos or
receiving it via network, but here's a report of 8 channels getting "only"
30 FPS, albeit with an unstated # of LEDS on ESP32.
https://www.reddit.com/r/FastLED/comments/kwojhs/esp32_rmt_unexpected_behaviour/
I don't know that we've done a lot of tuning for the 8 channel or thousands
of LED channel cases, but if you have anything vaguely reasonable, 5fps is
just pointing to us tripping over something.
Espressif claims there are several profiling tools
<https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/app_trace.html>
that gather data and uses the semi-hosting protocol to DMA blast the
results to a real computer for processing and there are additional tools in
FreeRTOS. But it's worth a first pass with a flashlight (oh, we have tasks
free-wheeling) before getting after it with a microscope. (finding cache
aliasing problems forcing bus contention on a stack frame or something else
exotic.)
Good night and Good luck!
RJL
…On Thu, Dec 7, 2023 at 1:50 AM nthrane ***@***.***> wrote:
Hi Robert,
I was also thinking about profiling and found the same article. A sampling
profiler should be good enough to find any hot paths, so I'll see if I can
get that working.
I also agree that solely based on gut feeling, the two cores should have
plenty of power. In any case, the profiling should shed some light on what
is going on.
I'm using WS2812 LEDs, and I even tried shortening the strip (in software)
to 20 leds. That seems like a very low number so the cpu should absolutely
be able to keep up. It still didn't solve the issue, so maybe I'm hitting a
problem with the wifi IO causing interrupts or something.
—
Reply to this email directly, view it on GitHub
<#561 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCSD35IOQEEEIXLUP4IV5DYIFYKVAVCNFSM6AAAAABAIE3BSSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TOOBVGA4TQ>
.
You are receiving this because you commented.Message ID:
<PlummersSoftwareLLC/NightDriverStrip/repo-discussions/561/comments/7785098
@github.com>
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi guys,
After watching one of Dave's youtube videos, I decided to try and replicate the "fireworks" effect that is featured in one of the videos.
Based on the recommendation from the video, I decided to get the M5 StickC Plus, assuming that it would be more than adequate.
Regardless of whether I install the pre-built version or if I build and install it through Pio, the M5 seems to be running at 72-100% cpu on one of the cores and more or less idling on the other. I tried moving tasks around between cores, but that does not seem to balance things out because it is just one task that eats all the cpu (the drawing task).
When I try to stream effects to the module, I get very choppy playback and only about 1-3 FPS. I tried slowing down my data rate and shortening the strip length to just 20 LEDS (in software), but that doesn't seem to help.
I tried building and running a few different configurations including m5plusdemo, laserline and so on but they seem to have the same behaviour.
Is it normal that the module is running at almost 100% cpu ? I would have thought that there would be plenty of cpu-power to drive the streaming (like Dave says in the video: "its not your grandfather's arduino").
/Niels
Beta Was this translation helpful? Give feedback.
All reactions