Skip to content

Commit 641128d

Browse files
authored
fix string_to_dict usage for windows (#7598)
1 parent b4ef388 commit 641128d

File tree

2 files changed

+5
-1
lines changed

2 files changed

+5
-1
lines changed

src/datasets/arrow_dataset.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3079,7 +3079,9 @@ def load_processed_shard_from_cache(shard_kwargs: dict[str, Any]) -> Dataset:
30793079
cache_file_with_suffix_pattern = cache_file_prefix + suffix_template + cache_file_ext
30803080

30813081
for cache_file in glob.iglob(f"{cache_file_prefix}*{cache_file_ext}"):
3082-
suffix_variable_map = string_to_dict(cache_file, cache_file_with_suffix_pattern)
3082+
suffix_variable_map = string_to_dict(
3083+
Path(cache_file).as_posix(), Path(cache_file_with_suffix_pattern).as_posix()
3084+
)
30833085
if suffix_variable_map is not None:
30843086
file_num_proc = int(suffix_variable_map["num_proc"])
30853087
existing_cache_file_map[file_num_proc].append(cache_file)

src/datasets/utils/py_utils.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,8 @@ def string_to_dict(string: str, pattern: str) -> Optional[dict[str, str]]:
175175
Args:
176176
string (str): input string
177177
pattern (str): pattern formatted like a python f-string
178+
This can be a regex - so in case of un-formatting paths you should use posix paths.
179+
Otherwise backslashes for windows paths can cause issues.
178180
179181
Returns:
180182
Optional[dict[str, str]]: dictionary of variable -> value, retrieved from the input using the pattern, or

0 commit comments

Comments
 (0)