Skip to content

Add ability to provide initial files to crawl#121

Merged
chrischrischris merged 3 commits intomainfrom
crawlfiles
Feb 10, 2026
Merged

Add ability to provide initial files to crawl#121
chrischrischris merged 3 commits intomainfrom
crawlfiles

Conversation

@chrischrischris
Copy link
Contributor

Needed for adobe/da-live#243

Previously crawl takes a root dir (path arg) and crawls everything within that dir and its children. Now we can provide an array of paths to crawl, as well as a list of files to also crawl.

@aem-code-sync
Copy link

aem-code-sync bot commented Nov 18, 2025

Hello, I'm the AEM Code Sync Bot and I will run some actions to deploy your branch and validate page speed.
In case there are problems, just click a checkbox below to rerun the respective action.

  • Re-run PSI checks
  • Re-sync branch
Commits

auniverseaway
auniverseaway previously approved these changes Dec 9, 2025
Copy link
Member

@auniverseaway auniverseaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "files" is probably misleading since its actually folders you want to traverse.

I would suggest:

  1. paths (as opposed to path)
  2. folders

My only argument against paths is that its so similar to path but that may be a feature... if someone does paths + string, we could convert to an array. Same for path... if they provide an array of paths to path we can upgrade it to an array.

@chrischrischris
Copy link
Contributor Author

@auniverseaway

#118 Added the ability to pass in either a path string, or an array of path strings to to the path param.

This PR allows also passing in file objects like:
{ path: '/custom/file1.html', name: 'file1', ext: 'html', lastModified: 123456789 }

This is specifically for the root search that excludes loc content. With the ability to pass paths AND files, we can give crawl all of the root files, and all of the non-loc folders to search at once.

We could just provide file paths instead of objects, but then we'd need to do a daFetch list call for every file path passed in, but when calling crawl we already have the file object.

@chrischrischris chrischrischris merged commit 6290ab4 into main Feb 10, 2026
3 of 4 checks passed
@chrischrischris chrischrischris deleted the crawlfiles branch February 10, 2026 23:55
@chrischrischris chrischrischris restored the crawlfiles branch February 12, 2026 14:38
@chrischrischris chrischrischris deleted the crawlfiles branch February 12, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants