Skip to content

ENH Improve performance of directory truncation#685

Merged
emteknetnz merged 1 commit intosilverstripe:3from
creative-commoners:pulls/3/more-performant-fs
May 22, 2025
Merged

ENH Improve performance of directory truncation#685
emteknetnz merged 1 commit intosilverstripe:3from
creative-commoners:pulls/3/more-performant-fs

Conversation

@GuySartorelli
Copy link
Copy Markdown
Member

@GuySartorelli GuySartorelli commented May 5, 2025

This PR does a few things:

  1. In FlysystemAssetStore::truncateDirectory(), check if the directory is empty more efficiently (see ENH Improve performance of directory truncation #685 (comment) and new Filesystem::isEmpty() method)
  2. Call FlysystemAssetStore::truncateDirectory() less often (mostly useful for adapters that can't do the empty check efficiently, e.g. S3 adapter which gets 1000 files per paginated fetch) now done in FIX Don't try truncate the same directory multiple times. #686
  3. Look for variants using a glob match pattern, which boosts performance specifically for the local filesystem adapter. With any other adapter, developers are responsible for decorating the adapter with a subclass that implements GlobContentLister if they want performance gains. This would be possible with the FTP adapter for example but not with the AWS S3 adapter.

TODO

  • Tests (will do after rebase)

Dependencies

Requires the following PRs to be merged first (they are currently included in this PR's changset but will be removed on rebase after those are merged into the base branch)

Issue

Comment thread src/FilenameParsing/FileIDHelperResolutionStrategy.php
Comment thread src/Flysystem/FlysystemAssetStore.php Outdated
Comment on lines +605 to +609
// Truncate any directories that held removed files
foreach (array_keys($tryTruncate) as $dir) {
$this->truncateDirectory($dir, $fs);
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In almost 100% of cases all variants will be in the same directory, so we'll only have one call here, instead of one per variant.

The only reason we still have to get an array of possible directories is because the way findVariants() and FileIDHelper are implemented allows for the possibility of custom implementations where variants could be in different directories.

Comment thread src/Flysystem/FlysystemAssetStore.php
@GuySartorelli GuySartorelli force-pushed the pulls/3/more-performant-fs branch from 722a27f to 5e75c0b Compare May 6, 2025 04:39
Comment thread src/Flysystem/GlobContentLister.php
Comment thread src/Flysystem/LocalFilesystemAdapter.php
Comment thread src/Flysystem/Filesystem.php
Comment thread src/FilenameParsing/FileIDHelperResolutionStrategy.php Outdated
@GuySartorelli GuySartorelli force-pushed the pulls/3/more-performant-fs branch 2 times, most recently from de340c8 to 4b58d25 Compare May 12, 2025 03:16
@GuySartorelli
Copy link
Copy Markdown
Member Author

@axllent There's a bit more work for me to do on this PR but I think it's in a state where it can be validated. Can you please check if installing this PR makes a noticeable difference in your use case?

@axllent
Copy link
Copy Markdown
Contributor

axllent commented May 12, 2025

Hi @GuySartorelli. To try give you the most accurate feedback, I set up a test project (with no custom modules) to ensure it's an accurate test. I programmatically created 5000 images, each with 7 variants (ie: total of 40,000 physical files in the directory). Then, in a build task I deleted the first 10 images and timed each deletion.

Using the official silverstripe/assets (beta-1 release):

vendor/bin/sake tasks:delete-files
Running task 'Delete Files Task'
Deleted 17470268002784.jpg in 3.383248090744 seconds
Deleted 17470268009409.jpg in 3.3241040706635 seconds
Deleted 17470268014878.jpg in 3.3198189735413 seconds
Deleted 17470268020674.jpg in 3.3776569366455 seconds
Deleted 1747026802615.jpg in 3.462189912796 seconds
Deleted 17470268031524.jpg in 3.3906650543213 seconds
Deleted 17470268036855.jpg in 3.4072329998016 seconds
Deleted 17470268043229.jpg in 3.3566009998322 seconds
Deleted 17470268048686.jpg in 3.3584780693054 seconds
Deleted 17470268053977.jpg in 3.4811389446259 seconds

Task 'Delete Files Task' completed successfully in 34 seconds

Then I replaced the assets module with your repo (pulls/3/more-performant-fs branch):

vendor/bin/sake tasks:delete-files
Running task 'Delete Files Task'
Deleted 174702680593.jpg in 0.13952803611755 seconds
Deleted 17470268065605.jpg in 0.085838079452515 seconds
Deleted 17470268071156.jpg in 0.077501058578491 seconds
Deleted 17470268076492.jpg in 0.075749158859253 seconds
Deleted 17470268081866.jpg in 0.086363077163696 seconds
Deleted 1747026808716.jpg in 0.091778039932251 seconds
Deleted 17470268092713.jpg in 0.10025715827942 seconds
Deleted 17470268098203.jpg in 0.10659098625183 seconds
Deleted 17470268103785.jpg in 0.10397505760193 seconds
Deleted 17470268109222.jpg in 0.093451976776123 seconds

Task 'Delete Files Task' completed successfully in one second

WOW, talk about a performance boost, well done dude! 🥳

I can confirm the variants were correctly deleted too, and I even ran the task over all remaining images and can also confirm there were no stragglers afterwards. Mission success! FWIW, once complete, I think this should really be backported to SS5 too.

Let me know if you require any more info.

@GuySartorelli
Copy link
Copy Markdown
Member Author

Awesome, thank you. That's the result I was hoping for 🚀

FWIW, once complete, I think this should really be backported to SS5 too.

Some of this will be merged into 5.4, but some of it requires new API so will need to be in 6.1.

@axllent
Copy link
Copy Markdown
Contributor

axllent commented May 12, 2025

Awesome.

some of it requires new API so will need to be in 6.1.

That's a shame given we can't plan around an unspecified date (and this heavily impacts a project we're working on), but I understand re: the API change. Oh well, the main thing is you have hit the nail on the head so it will be fixed at some point!

@GuySartorelli
Copy link
Copy Markdown
Member Author

That's a shame given we can't plan around an unspecified date (and this heavily impacts a project we're working on)

CMS 6.1 will be released in October (minor releases are every 6 months in April and October as per the minor release policy).
Hopefully the changes in #686 will help a little in the meantime.

@GuySartorelli GuySartorelli force-pushed the pulls/3/more-performant-fs branch 2 times, most recently from 0127032 to 756c6ba Compare May 14, 2025 02:23
) {
$this->pathPrefixer = new PathPrefixer($location);
$this->linkHandling = $linkHandling;
$this->visibility = $visibility ??= new PortableVisibilityConverter();
Copy link
Copy Markdown
Member Author

@GuySartorelli GuySartorelli May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See parent class constructor - though since that sets private properties we have to do it here as well to access them

@GuySartorelli GuySartorelli force-pushed the pulls/3/more-performant-fs branch from 756c6ba to 997ae16 Compare May 15, 2025 01:12
Copy link
Copy Markdown
Member

@emteknetnz emteknetnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick question before I review further:

We now have this in SilverStripe\Assets\Flysystem\FileSystem

    /**
     * Check if a directory is empty
     */
    public function isEmpty(string $location): bool
    {
        // listContents() uses generators, so we can start the iterator and return false if there's a single item.
        // In most cases this will be orders of magnitude faster than checking if $this->listContents($location)->toArray() is empty.
        foreach ($this->listContents($location) as $item) {
            return false;
        }
        return true;
    }

Is this all we need? Do we actually need all of the glob stuff to solve the original issue?

Comment thread src/Flysystem/LocalFilesystemAdapter.php Outdated
Comment thread src/FilenameParsing/FileIDHelperResolutionStrategy.php
Comment thread src/FilenameParsing/GlobbableFileIDHelper.php
Comment thread src/Flysystem/LocalFilesystemAdapter.php Outdated
@GuySartorelli
Copy link
Copy Markdown
Member Author

Is this all we need? Do we actually need all of the glob stuff to solve the original issue?

We need both, as they both improve things in different ways, for slightly different use cases.

isEmpty() improves things when truncating directories i.e. when something gets deleted. But it does nothing to improve other cases. The globbing improves things any time we search for a variant, which does happen when deleting (further improving the performance of that scenario) but also in other cases such as publishing, and making new variants (which will sometimes check if that variant already exists before recreating it).

@GuySartorelli GuySartorelli force-pushed the pulls/3/more-performant-fs branch from 997ae16 to 1b5fc89 Compare May 19, 2025 05:12
Comment thread src/FilenameParsing/FileIDHelper.php
@GuySartorelli GuySartorelli force-pushed the pulls/3/more-performant-fs branch from 1b5fc89 to d131ffb Compare May 19, 2025 05:14
@GuySartorelli GuySartorelli marked this pull request as ready for review May 19, 2025 21:07
Comment thread src/Flysystem/Filesystem.php
Comment thread src/Flysystem/LocalFilesystemAdapter.php
Comment thread src/FilenameParsing/NaturalFileIDHelper.php Outdated
Comment thread src/FilenameParsing/HashFileIDHelper.php Outdated
@GuySartorelli GuySartorelli force-pushed the pulls/3/more-performant-fs branch from d131ffb to dda86b2 Compare May 20, 2025 02:43
Comment thread src/Flysystem/Filesystem.php
@GuySartorelli
Copy link
Copy Markdown
Member Author

#685 (comment)

Comment thread src/Flysystem/Filesystem.php
Comment thread src/FilenameParsing/HashFileIDHelper.php
- Better performance to check if directory is empty
- Better performance checking for variants using glob where possible
@GuySartorelli GuySartorelli force-pushed the pulls/3/more-performant-fs branch from dda86b2 to 8c66e83 Compare May 22, 2025 00:17
@emteknetnz emteknetnz merged commit 6e262a2 into silverstripe:3 May 22, 2025
8 checks passed
@emteknetnz emteknetnz deleted the pulls/3/more-performant-fs branch May 22, 2025 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants