feat(slices): add in-memory caching for slice content and stitched output#39335
feat(slices): add in-memory caching for slice content and stitched output#39335kdichev wants to merge 1 commit intogatsbyjs:masterfrom
Conversation
|
From my perspective, I’m working with a fairly large Gatsby site, around 5,000 active pages (originally 25,000, but I’ve removed outdated content). Each page includes 8 unique slices. Before making changes, the slice processing step alone took about 300 seconds, which was surprisingly close to the time required for optimizing 6,000 images. I reviewed the code and introduced some performance improvements that brought the slice processing time down from ~300s to ~80s. I also tested whether the regex logic was a bottleneck, but replacing it with a custom parser didn’t yield any speed gains. However, moving the slice queue from fastq to worker threads brought cold build time down further to 35s, and hot builds (with slice changes) to around 50s. I noticed that my machine’s build resources weren’t being fully utilized before, but after introducing workers, usage hit full capacity. This could be a solid further general improvement. Let me know if this is something you'd consider viable, and I’ll be happy to create add it. |
|
This is overall reasonable change. The main thing I worry here is that the cached content is strongly held in memory and with setup as-is it might result in out-of-memory errors in scenarios that was not happening before with sufficiently large number of slice variants and/or large slice variants content. As this is an optimization attempt and source of data is still in files I think some kind of lru-cache OR wrapping content in WeakRef would be advised to protect against unbound growth of strongly referenced content that would prevent allocated memory from ever being reclaimed in cases of memory pressure |
|
@pieh This is reasonable feedback, thanks for noting the memory issue! I honestly didn’t think about that at all. I’ve got a fairly powerful machine, so I guess it hasn’t been high on my list of concerns, heh. I’ll work on the suggestions and see how it goes. If there’s already something similar in the repo, and it’s not a burden, I’d appreciate a link so I can take inspiration and stay aligned with accepted practices here. |
This PR adds two simple in-memory caches to speed up the HTML stitching process:
sliceCache - stores the raw HTML of each slice after it's read from disk
stitchedSliceCache - stores the fully-stitched version of each slice to avoid re-processing
With 5,000+ pages being stitched, we were re-reading and re-stitching the same slices many times. Most of our slices are reused heavily, so caching makes a big difference.
On local benchmarks, this reduced stitching time for 5000 pages, 8 slices into every page from ~300s to ~80s.
next steps:
by introducing workers, cold builds speeds improve from ~300s to ~35s