ENH Cache mime type info to reduce memory usage#702
ENH Cache mime type info to reduce memory usage#702lozcalver wants to merge 1 commit intosilverstripe:3from
Conversation
|
Please do split these into separate PRs - I see no reason to delay merging the straight forward enhancement while we discuss the caching of the mimetype. With the CMS 6.2 beta release just a few weeks away, it'd be nice to get the easy win in even if it takes a while to discuss the other one. |
7061ea9 to
6b036f2
Compare
There was a problem hiding this comment.
Thanks for splitting that out.
Cache type
My immediate thought was that the database feels like a more suitable place for this - we already, but I immediately discounted that when I realised that won't work for some variants (e.g. $file->Convert('webp') will create a variant with a different mimetype than the file stored in the db). So this is probably the correct type of cache for this purpose.
Actual improvement
Do you have any benchmarks for this approach? In the issue you showed the difference between the current implementation of getIsImage() and simply returning true. This change while obviously more appropriate also still has more going on than that, so I'd like to see there is a speed improvement that outweighs the increased memory usage.
Your own questions
You posed these questions in the issue:
(A) how big might that cache end up being?
(B) presumably we’d need to flush it whenever the file hash changed (or at the very least include the file hash in the key)?
Do you have answers to these?
For B I see you've included the hash in the key - though of course that approach means the cache size will grow whenever that hash changes, so the answer to A seems important here.
| /** @var CacheInterface $cache */ | ||
| $cache = Injector::inst()->get(CacheInterface::class . '.FlysystemAssetStore'); | ||
| $cache->clear(); |
There was a problem hiding this comment.
Do the cache adapters not get flushed on their own? (I genuinely can't remember)
| { | ||
| // Flysystem calculates the mime type from the file contents, which is memory intensive, so we cache the result | ||
| $cache = $this->getCache(); | ||
| $key = md5($filename . $hash . $variant); |
There was a problem hiding this comment.
| $key = md5($filename . $hash . $variant); | |
| $key = 'mimetype__' . md5($filename . $hash . $variant); |
This is a cache for the whole store - we aren't using it for anything right now, but we easily could. Because of that we should namespace the cache key.
I was thinking about this a bit more earlier, it’s really difficult to even estimate. It depends on many things, including:
My main concern actually isn’t the size of the cache – each entry is a simple string, so is only a handful of bytes – but more how that cache is handled. Some hosts (particularly shared hosting platforms) have restricted numbers of inodes, so if the underlying cache implementation creates one tiny file per file/variant then we risk eating those limits quickly. We can’t code around unknown theoretical limits on shared hosting environments of course, but it’s something to be mindful of anyway. Sticking them all in one big array that we then save into the cache risks the opposite problem: if a site has millions of files/variants, that one array could be a big memory hog. One to think about... |
|
Closing this because I can’t think of a solution I’m happy with |
Description
This is a possible solution to #700 for discussion. The cache key includes the file hash, and I think it’s reasonable to expect that if someone updates the file in the filesystem manually, they’ll also update the hash in the database as there’s a bunch of logic throughout core that depends on that.
Manual testing steps
XHProf, this snippet should be a reasonable benchmark:
You should see a difference in wall time, memory usage and the number of function calls.
Issues
Pull request checklist