Skip to content

Using find for array jobsΒ #463

@wwarriner

Description

@wwarriner

What would you like to see added?

We can use find to determine how many files or directories are present in a given directory.

Files: find <containing-directory> -type f -name "<glob>"
Immediate subdirectories: find <containing-directory> -mindepth 1 -maxdepth 1 -type d -name "<glob>"
Recursive subdirectories: find <containing-directory> -mindepth 1 -type d -name "<glob>"
(more details: https://linuxhostsupport.com/blog/how-to-search-files-on-the-linux-terminal/)
(quotes are needed around glob: https://stackoverflow.com/questions/6495501/find-paths-must-precede-expression-how-do-i-specify-a-recursive-search-that/6495536#6495536)

Find can be helpful in determining how many things to iterate on with our array job. We can determine the upper limit of tasks using find ... | wc -l. The value can be used to give an upper limit statically by running once and making a note of the value. Or it can be used dynamically as part of a wrapper script around the payload script submitted with sbatch.

We recommend avoiding using ls for any post-processing or data aggregation operations such as wc -l or grep. The combination of ls and these tools can produce unexpected results.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions