Skip to content

Commit 98ca2a2

Browse files
authored
⚡️ Speed up method GoogleDriveIndexer.count_files_recursively by 29% (#593)
1 parent b2ff41a commit 98ca2a2

File tree

3 files changed

+8
-2
lines changed

3 files changed

+8
-2
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
## 1.2.17-dev2
2+
3+
* **Optimize `GoogleDriveIndexer.count_files_recursively`**
4+
15
## 1.2.17-dev1
26

37
* **Optimize `MilvusUploadStager.parse_date_string`**

unstructured_ingest/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "1.2.17-dev1" # pragma: no cover
1+
__version__ = "1.2.17-dev2" # pragma: no cover

unstructured_ingest/processes/connectors/google_drive.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,9 @@ def count_files_recursively(
189189
"""
190190
count = 0
191191
stack = [folder_id]
192+
# Pre-compute lower-case extension set for O(1) lookup
193+
valid_exts = set(e.lower() for e in extensions) if extensions else None
194+
192195
while stack:
193196
current_folder = stack.pop()
194197
# Always list all items under the current folder.
@@ -212,7 +215,6 @@ def count_files_recursively(
212215
if extensions:
213216
# Use a case-insensitive comparison for the file extension.
214217
file_ext = (item.get("fileExtension") or "").lower()
215-
valid_exts = [e.lower() for e in extensions]
216218
if file_ext in valid_exts:
217219
count += 1
218220
else:

0 commit comments

Comments
 (0)