-
Notifications
You must be signed in to change notification settings - Fork 22
Description
I'll create the issue following the repository's template format:
Bug Report
- Yes, I reviewed the contribution guidelines.
- Yes, more specifically, I reviewed the guidelines on how to write clear bug reports.
Describe the current, buggy behavior
The get_file_list() method in src/Dist_Archive_Command. php (lines 493-536) uses RecursiveIteratorIterator which iterates through every single file in the directory tree, including all files inside directories that should be ignored according to .distignore. This causes severe performance degradation when dealing with large ignored directories like node_modules.
The problem occurs because:
- The
RecursiveIteratorIteratordescends into every subdirectory, including ignored ones likenode_modules/ - For each file (potentially 30,000+ files in
node_modules), the iterator:- Creates an
SplFileInfoobject - Enters the
foreachloop - Calculates the relative filepath
- Calls
$this->checker->isPathIgnored()to check if it should be excluded
- Creates an
- Only after all this work does it decide not to include the file in the archive
The check if ( $this->checker->isPathIgnored( $relative_filepath ) ) happens inside the loop for every single item found, rather than preventing descent into ignored directories in the first place.
Describe how other contributors can replicate this bug
- Create a WordPress plugin with a
node_modulesdirectory containing 30,000+ files (or any large directory) - Add
node_modulesto your.distignorefile:
node_modules
. git
vendor
tests
- Run
wp dist-archive . build. zipfrom the plugin directory - Observe that the command takes several minutes to complete
- The command iterates through all 30,000+ files in
node_moduleseven though the entire directory is ignored
Describe what you would expect as the correct outcome
The iterator should skip descending into directories that are marked as ignored in .distignore, avoiding the need to iterate through their contents entirely.
For a project with node_modules containing 30,000 files:
- Current behavior: 30,000+ iterations and
isPathIgnored()checks, taking several minutes - Expected behavior: Skip the
node_modulesdirectory at the directory level, only iterate through files that need to be processed, completing in seconds
Let us know what environment you are running this on
OS: [Various - affects all operating systems]
PHP version: [Various - affects all PHP versions]
WP-CLI version: [Various - affects current versions using this package]
The issue is present in the current implementation regardless of environment.
Provide a possible solution
The solution is to use a RecursiveFilterIterator to filter out ignored directories before the RecursiveIteratorIterator descends into them. This way, the iterator never enters ignored directories, avoiding thousands of unnecessary iterations.
The filter would check at the directory level: "If this is a directory and it's ignored in .distignore, don't descend into it." This is fundamentally different from the current approach which checks every file after already finding it.
Provide additional context/Screenshots
This is a well-known performance pattern issue with RecursiveIteratorIterator. The iterator is designed to "flatten" the entire directory tree before filtering can occur.
Related discussion: The issue was previously mentioned in #81 (comment)
For WordPress plugins with modern build processes (npm packages, composer vendor directories, etc.), this performance issue makes the dist-archive command nearly unusable without manually deleting these directories first, which defeats the purpose of having a .distignore file.
To submit this issue:
Go to https://github.com/wp-cli/dist-archive-command/issues/new and paste the content above, or would you like me to attempt to create it directly for you?