Skip to content

Conversation

@ahal
Copy link
Collaborator

@ahal ahal commented Mar 25, 2025

Previously we were walking the file system and were e.g scanning files in the .git directory. This can be quite slow on larger repos.

Fixes #663

@ahal ahal requested a review from a team March 25, 2025 16:09
@ahal ahal self-assigned this Mar 25, 2025
@ahal ahal requested a review from hneiva March 25, 2025 16:09
@jcristau
Copy link
Contributor

@glandium fyi

@ahal ahal force-pushed the ahal/push-msspxqnkkmvx branch 2 times, most recently from ef374ea to 3084558 Compare March 25, 2025 16:50
@ahal
Copy link
Collaborator Author

ahal commented Mar 25, 2025

Note that gecko_taskgraph already does this, but it uses FileFinder so I couldn't port the fix exactly.

@ahal ahal force-pushed the ahal/push-msspxqnkkmvx branch from 3084558 to 69df132 Compare March 25, 2025 16:58
@ahal ahal merged commit 5143893 into taskcluster:main Mar 28, 2025
17 checks passed
@ahal ahal deleted the ahal/push-msspxqnkkmvx branch March 28, 2025 20:07
@gregtatum
Copy link
Contributor

Thanks for the work on this, this will be a big quality of life improvement for translations!

bhearsum added a commit to bhearsum/taskgraph that referenced this pull request Jul 22, 2025
Since taskcluster#664 we started returning paths relative to the repository root from `_find_matching_files`. This meant that when we join `base_path` with `path` in `hash_paths`, we end up with `base_path` in there twice.

Fixing this without breaking `mozpath.match` means we need to join together the repository root and matched files after `mozpatch.match`. This, in turn, requires that some tests are able to call `get_repository`, which requires faking a repository being present.
bhearsum added a commit to bhearsum/taskgraph that referenced this pull request Jul 22, 2025
Since taskcluster#664 we started returning paths relative to the repository root from `_find_matching_files`. This meant that when we join `base_path` with `path` in `hash_paths`, we end up with `base_path` in there twice.

Fixing this without breaking `mozpath.match` means we need to join together the repository root and matched files after `mozpatch.match`. This, in turn, requires that some tests are able to call `get_repository`, which requires faking a repository being present.
bhearsum added a commit to bhearsum/taskgraph that referenced this pull request Jul 22, 2025
Since taskcluster#664 we started returning paths relative to the repository root from `_find_matching_files`. This meant that when we join `base_path` with `path` in `hash_paths`, we end up with `base_path` in there twice.

Fixing this without breaking `mozpath.match` means we need to join together the repository root and matched files after `mozpatch.match`. This, in turn, requires that some tests are able to call `get_repository`, which requires faking a repository being present.
bhearsum added a commit to bhearsum/taskgraph that referenced this pull request Jul 22, 2025
Since taskcluster#664 we started returning paths relative to the repository root from `_find_matching_files`. This meant that when we join `base_path` with `path` in `hash_paths`, we end up with `base_path` in there twice.

Fixing this without breaking `mozpath.match` means we need to join together the repository root and matched files after `mozpatch.match`. This, in turn, requires that some tests are able to call `get_repository`, which requires faking a repository being present.
bhearsum added a commit that referenced this pull request Jul 23, 2025
Since #664 we started returning paths relative to the repository root from `_find_matching_files`. This meant that when we join `base_path` with `path` in `hash_paths`, we end up with `base_path` in there twice.

Fixing this without breaking `mozpath.match` means we need to join together the repository root and matched files after `mozpatch.match`. This, in turn, requires that some tests are able to call `get_repository`, which requires faking a repository being present.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_get_all_files in util/hash.py very slow on large repos

4 participants