Skip to content

Performance Issue: RecursiveIteratorIterator scans all files in ignored directories before filteringΒ #115

@alimuzzaman

Description

@alimuzzaman

I'll create the issue following the repository's template format:


Bug Report

Describe the current, buggy behavior

The get_file_list() method in src/Dist_Archive_Command. php (lines 493-536) uses RecursiveIteratorIterator which iterates through every single file in the directory tree, including all files inside directories that should be ignored according to .distignore. This causes severe performance degradation when dealing with large ignored directories like node_modules.

The problem occurs because:

  1. The RecursiveIteratorIterator descends into every subdirectory, including ignored ones like node_modules/
  2. For each file (potentially 30,000+ files in node_modules), the iterator:
    • Creates an SplFileInfo object
    • Enters the foreach loop
    • Calculates the relative filepath
    • Calls $this->checker->isPathIgnored() to check if it should be excluded
  3. Only after all this work does it decide not to include the file in the archive

The check if ( $this->checker->isPathIgnored( $relative_filepath ) ) happens inside the loop for every single item found, rather than preventing descent into ignored directories in the first place.

Describe how other contributors can replicate this bug

  1. Create a WordPress plugin with a node_modules directory containing 30,000+ files (or any large directory)
  2. Add node_modules to your .distignore file:
node_modules
. git
vendor
tests
  1. Run wp dist-archive . build. zip from the plugin directory
  2. Observe that the command takes several minutes to complete
  3. The command iterates through all 30,000+ files in node_modules even though the entire directory is ignored

Describe what you would expect as the correct outcome

The iterator should skip descending into directories that are marked as ignored in .distignore, avoiding the need to iterate through their contents entirely.

For a project with node_modules containing 30,000 files:

  • Current behavior: 30,000+ iterations and isPathIgnored() checks, taking several minutes
  • Expected behavior: Skip the node_modules directory at the directory level, only iterate through files that need to be processed, completing in seconds

Let us know what environment you are running this on

OS: [Various - affects all operating systems]
PHP version: [Various - affects all PHP versions]
WP-CLI version: [Various - affects current versions using this package]

The issue is present in the current implementation regardless of environment.

Provide a possible solution

The solution is to use a RecursiveFilterIterator to filter out ignored directories before the RecursiveIteratorIterator descends into them. This way, the iterator never enters ignored directories, avoiding thousands of unnecessary iterations.

The filter would check at the directory level: "If this is a directory and it's ignored in .distignore, don't descend into it." This is fundamentally different from the current approach which checks every file after already finding it.

Provide additional context/Screenshots

This is a well-known performance pattern issue with RecursiveIteratorIterator. The iterator is designed to "flatten" the entire directory tree before filtering can occur.

Related discussion: The issue was previously mentioned in #81 (comment)

For WordPress plugins with modern build processes (npm packages, composer vendor directories, etc.), this performance issue makes the dist-archive command nearly unusable without manually deleting these directories first, which defeats the purpose of having a .distignore file.


To submit this issue:

Go to https://github.com/wp-cli/dist-archive-command/issues/new and paste the content above, or would you like me to attempt to create it directly for you?

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions