Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Jun 1, 2025

📄 589% (5.89x) speedup for _get_ignore_patterns in src/diffusers/pipelines/pipeline_loading_utils.py

⏱️ Runtime : 22.0 milliseconds 3.19 milliseconds (best of 204 runs)

📝 Explanation and details

Here is an optimized version of your code. The line profiler output makes it clear the main hot spots are in is_safetensors_compatible, especially in.

  • Filtering filenames by folder_names (OS split in a set comprehension over thousands of files)
  • Splitting filename strings on "/" in a loop
  • os.path.splitext called in a loop for every file

The main optimizations.

  • Avoid repeated string splits and splits for filtering: Instead of splitting thousands of times, collect info in one pass. Use tuple unpacking where possible.
  • Minimize OS/path ops per file: Do as much in one pass as possible, and avoid unneeded splits and compositions.
  • Early return on missing safetensors
  • Use local variable lookups
  • Some set operations are replaced with more efficient list comprehensions when possible (since order doesn't matter), and dicts are built with less branching.

Code comments are preserved where relevant.

Summary of speedups:

  • Avoid repeated string splits or os.path funcs in hot file loops.
  • Use local sets for faster lookup.
  • Break early when possible and exploit the fact that file path structure is simple (at most one /).
  • Switch to .endswith for extensions in tight loops, which is much faster than os.path.splitext.

You can expect this code to use much less time and memory in is_safetensors_compatible (the main hotspot) for large input lists.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 39 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests Details
import os
from typing import List, Optional

# imports
import pytest  # used for our unit tests
from src.diffusers.pipelines.pipeline_loading_utils import _get_ignore_patterns

# unit tests

# -----------------
# BASIC TEST CASES
# -----------------


def test_safetensors_compatible_and_used_no_onnx():
    # use_safetensors and compatible: ignores .bin/.msgpack/.onnx/.pb
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet", "text"],
        model_filenames=[
            "unet/model.safetensors", "text/model.safetensors", "unet/model.bin", "text/model.bin"
        ],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_safetensors_compatible_and_used_with_onnx():
    # use_safetensors and compatible, but use_onnx True: ignores .bin/.msgpack only
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet", "text"],
        model_filenames=[
            "unet/model.safetensors", "text/model.safetensors", "unet/model.bin", "text/model.bin"
        ],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=True,
        is_onnx=False,
    ); patterns = codeflash_output

def test_not_using_safetensors_ignores_safetensors():
    # use_safetensors=False: ignores .safetensors/.msgpack/.onnx/.pb
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet", "text"],
        model_filenames=[
            "unet/model.safetensors", "text/model.safetensors", "unet/model.bin", "text/model.bin"
        ],
        use_safetensors=False,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_not_using_safetensors_with_onnx():
    # use_safetensors=False, use_onnx True: ignores .safetensors/.msgpack only
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet", "text"],
        model_filenames=[
            "unet/model.safetensors", "text/model.safetensors", "unet/model.bin", "text/model.bin"
        ],
        use_safetensors=False,
        from_flax=False,
        allow_pickle=False,
        use_onnx=True,
        is_onnx=False,
    ); patterns = codeflash_output

def test_pickle_allowed_skips_compatibility_check():
    # allow_pickle=True, use_safetensors=True, but missing safetensors: should NOT raise
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet", "text"],
        model_filenames=[
            "unet/model.bin", "text/model.bin"
        ],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=True,  # disables safetensors compatibility check
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

# -----------------
# EDGE TEST CASES
# -----------------

def test_missing_safetensors_raises():
    # use_safetensors=True, allow_pickle=False, but no safetensors files: should raise
    with pytest.raises(EnvironmentError) as excinfo:
        _get_ignore_patterns(
            passed_components=[],
            model_folder_names=["unet", "text"],
            model_filenames=[
                "unet/model.bin", "text/model.bin"
            ],
            use_safetensors=True,
            from_flax=False,
            allow_pickle=False,
            use_onnx=False,
            is_onnx=False,
        )

def test_empty_filenames_and_folders():
    # No filenames or folders, use_safetensors=False
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=[],
        model_filenames=[],
        use_safetensors=False,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_empty_filenames_and_folders_with_flax():
    # No filenames or folders, from_flax=True
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=[],
        model_filenames=[],
        use_safetensors=False,
        from_flax=True,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_no_component_folders_but_safetensors_file():
    # No folders, but safetensors file in root
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=[],
        model_filenames=["model.safetensors"],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_variant_argument_passed_through():
    # variant is passed, but function doesn't use it except in error message
    with pytest.raises(EnvironmentError) as excinfo:
        _get_ignore_patterns(
            passed_components=[],
            model_folder_names=["unet"],
            model_filenames=["unet/model.bin"],
            use_safetensors=True,
            from_flax=False,
            allow_pickle=False,
            use_onnx=False,
            is_onnx=False,
            variant="test-variant"
        )


def test_folder_names_filters_files():
    # Only files in folder_names should be considered for compatibility
    # Only "text" has safetensors, but folder_names=["unet"], so compatibility fails
    with pytest.raises(EnvironmentError):
        _get_ignore_patterns(
            passed_components=[],
            model_folder_names=["unet"],
            model_filenames=[
                "unet/model.bin", "text/model.safetensors"
            ],
            use_safetensors=True,
            from_flax=False,
            allow_pickle=False,
            use_onnx=False,
            is_onnx=False,
        )

def test_is_onnx_overrides_use_onnx():
    # use_onnx=None, is_onnx=True: should not ignore onnx/pb
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=["unet/model.safetensors"],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=None,
        is_onnx=True,
    ); patterns = codeflash_output

def test_files_with_extra_path_separators():
    # Files with more than one '/' are ignored in safetensors compatibility
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=[
            "unet/model.safetensors",
            "unet/extra/model.bin",  # should be ignored by compatibility check
        ],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

# -----------------
# LARGE SCALE TEST CASES
# -----------------

def test_large_number_of_components_safetensors_compatible():
    # 500 components, all have .safetensors files
    folder_names = [f"comp{i}" for i in range(500)]
    filenames = [f"{folder}/model.safetensors" for folder in folder_names]
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=folder_names,
        model_filenames=filenames,
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_large_number_of_components_missing_one_safetensors():
    # 500 components, one missing .safetensors file, should raise
    folder_names = [f"comp{i}" for i in range(500)]
    filenames = [f"{folder}/model.safetensors" for folder in folder_names[:-1]]  # last one missing
    filenames.append(f"{folder_names[-1]}/model.bin")
    with pytest.raises(EnvironmentError):
        _get_ignore_patterns(
            passed_components=[],
            model_folder_names=folder_names,
            model_filenames=filenames,
            use_safetensors=True,
            from_flax=False,
            allow_pickle=False,
            use_onnx=False,
            is_onnx=False,
        )

def test_large_number_of_files_with_pickle_allowed():
    # 1000 files, allow_pickle=True: should never raise, even if no safetensors
    folder_names = [f"comp{i}" for i in range(1000)]
    filenames = [f"{folder}/model.bin" for folder in folder_names]
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=folder_names,
        model_filenames=filenames,
        use_safetensors=True,
        from_flax=False,
        allow_pickle=True,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_large_number_of_files_not_using_safetensors():
    # 1000 files, use_safetensors=False: should ignore safetensors/msgpack/onnx/pb
    folder_names = [f"comp{i}" for i in range(1000)]
    filenames = [f"{folder}/model.bin" for folder in folder_names]
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=folder_names,
        model_filenames=filenames,
        use_safetensors=False,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_large_number_of_files_with_from_flax():
    # 1000 files, from_flax=True: always ignores all
    folder_names = [f"comp{i}" for i in range(1000)]
    filenames = [f"{folder}/model.safetensors" for folder in folder_names]
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=folder_names,
        model_filenames=filenames,
        use_safetensors=True,
        from_flax=True,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import os
from typing import List, Optional

# imports
import pytest  # used for our unit tests
from src.diffusers.pipelines.pipeline_loading_utils import _get_ignore_patterns

# unit tests

# -------------------------------
# Basic Test Cases
# -------------------------------

def test_flax_mode_ignores_all_weights():
    # If from_flax is True, all weight formats should be ignored
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=[],
        model_filenames=["unet/model.safetensors", "unet/model.bin", "unet/model.onnx", "unet/model.pb"],
        use_safetensors=True,
        from_flax=True,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_safetensors_compatible_and_use_safetensors():
    # If safetensors are compatible and use_safetensors is True, .bin and .msgpack should be ignored
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=["unet/model.safetensors"],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_safetensors_compatible_and_use_safetensors_with_onnx():
    # If safetensors are compatible, use_safetensors is True, and use_onnx is True, .onnx and .pb should NOT be ignored
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=["unet/model.safetensors"],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=True,
        is_onnx=False,
    ); patterns = codeflash_output

def test_not_safetensors_compatible_and_use_safetensors_raises():
    # If use_safetensors is True, allow_pickle is False, and no safetensors present, should raise
    with pytest.raises(EnvironmentError):
        _get_ignore_patterns(
            passed_components=[],
            model_folder_names=["unet"],
            model_filenames=["unet/model.bin"],
            use_safetensors=True,
            from_flax=False,
            allow_pickle=False,
            use_onnx=False,
            is_onnx=False,
        )

def test_not_safetensors_compatible_but_allow_pickle():
    # If use_safetensors is True, but allow_pickle is True, should NOT raise, should ignore safetensors and msgpack
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=["unet/model.bin"],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=True,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_default_to_safetensors_false():
    # If use_safetensors is False, should ignore safetensors and msgpack
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=["unet/model.bin"],
        use_safetensors=False,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_default_to_safetensors_false_with_onnx():
    # If use_safetensors is False and use_onnx is True, .onnx and .pb should NOT be ignored
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=["unet/model.bin"],
        use_safetensors=False,
        from_flax=False,
        allow_pickle=False,
        use_onnx=True,
        is_onnx=False,
    ); patterns = codeflash_output

def test_safetensors_compatible_with_passed_components():
    # If a component is passed in passed_components, it should be ignored for safetensors compatibility
    codeflash_output = _get_ignore_patterns(
        passed_components=["unet"],
        model_folder_names=["unet", "vae"],
        model_filenames=["unet/model.safetensors", "vae/model.safetensors"],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

# -------------------------------
# Edge Test Cases
# -------------------------------

def test_empty_model_filenames_and_folders():
    # If no files or folders, should default to ignore safetensors/msgpack/onnx/pb
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=[],
        model_filenames=[],
        use_safetensors=False,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_empty_model_filenames_with_safetensors_and_raise():
    # If use_safetensors True, allow_pickle False, but no files, should raise
    with pytest.raises(EnvironmentError):
        _get_ignore_patterns(
            passed_components=[],
            model_folder_names=[],
            model_filenames=[],
            use_safetensors=True,
            from_flax=False,
            allow_pickle=False,
            use_onnx=False,
            is_onnx=False,
        )

def test_no_component_folders_but_safetensors_file():
    # If no component folders, but a safetensors file exists at root, should be compatible
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=[],
        model_filenames=["model.safetensors"],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_variant_argument_passed():
    # Variant argument should not affect ignore patterns logic directly
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=["unet/model.safetensors"],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
        variant="my-variant"
    ); patterns = codeflash_output

def test_files_with_unusual_extensions():
    # Files with unknown extensions should not affect ignore patterns
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=["unet/model.unknown"],
        use_safetensors=False,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_files_with_multiple_dots():
    # Filenames with multiple dots should not break extension parsing
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=["unet/model.v1.2.safetensors"],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_component_folder_filtering():
    # Only files in model_folder_names should be considered for compatibility
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=["unet/model.safetensors", "vae/model.safetensors"],
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_component_with_no_files():
    # If component folder is listed but has no files, should not be compatible for safetensors
    with pytest.raises(EnvironmentError):
        _get_ignore_patterns(
            passed_components=[],
            model_folder_names=["unet"],
            model_filenames=[],
            use_safetensors=True,
            from_flax=False,
            allow_pickle=False,
            use_onnx=False,
            is_onnx=False,
        )

def test_is_onnx_true_overrides_use_onnx_false():
    # If is_onnx is True, .onnx and .pb should NOT be ignored, even if use_onnx is False
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=["unet"],
        model_filenames=["unet/model.bin"],
        use_safetensors=False,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=True,
    ); patterns = codeflash_output

# -------------------------------
# Large Scale Test Cases
# -------------------------------

def test_many_components_safetensors_compatible():
    # 100 components, each with a safetensors file, should be compatible
    num_components = 100
    model_folder_names = [f"comp{i}" for i in range(num_components)]
    model_filenames = [f"comp{i}/model.safetensors" for i in range(num_components)]
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=model_folder_names,
        model_filenames=model_filenames,
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output


def test_many_files_with_mixed_extensions():
    # 500 files, half .bin, half .safetensors, use_safetensors True, should be compatible
    num_files = 500
    model_folder_names = ["unet"]
    safetensors_files = [f"unet/model_{i}.safetensors" for i in range(num_files//2)]
    bin_files = [f"unet/model_{i}.bin" for i in range(num_files//2)]
    model_filenames = safetensors_files + bin_files
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=model_folder_names,
        model_filenames=model_filenames,
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_large_passed_components():
    # 1000 components, all passed in passed_components, so compatibility check is skipped and should not raise
    num_components = 1000
    model_folder_names = [f"comp{i}" for i in range(num_components)]
    model_filenames = [f"comp{i}/model.bin" for i in range(num_components)]
    passed_components = list(model_folder_names)
    codeflash_output = _get_ignore_patterns(
        passed_components=passed_components,
        model_folder_names=model_folder_names,
        model_filenames=model_filenames,
        use_safetensors=True,
        from_flax=False,
        allow_pickle=True,  # allow_pickle True, so no error even if not safetensors compatible
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output

def test_large_scale_with_mixed_folder_names():
    # 500 folders, each with a safetensors and a bin file, use_safetensors True
    num_folders = 500
    model_folder_names = [f"folder_{i}" for i in range(num_folders)]
    model_filenames = []
    for folder in model_folder_names:
        model_filenames.append(f"{folder}/model.safetensors")
        model_filenames.append(f"{folder}/model.bin")
    codeflash_output = _get_ignore_patterns(
        passed_components=[],
        model_folder_names=model_folder_names,
        model_filenames=model_filenames,
        use_safetensors=True,
        from_flax=False,
        allow_pickle=False,
        use_onnx=False,
        is_onnx=False,
    ); patterns = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_get_ignore_patterns-mbdbuzp3 and push.

Codeflash

Here is an optimized version of your code. The line profiler output makes it clear the **main hot spots** are in `is_safetensors_compatible`, especially in.

- Filtering filenames by `folder_names` (OS split in a set comprehension over thousands of files)
- Splitting filename strings on `"/"` in a loop
- `os.path.splitext` called in a loop for every file

The main optimizations.
- **Avoid repeated string splits and splits for filtering**: Instead of splitting thousands of times, collect info in one pass. Use tuple unpacking where possible.
- **Minimize OS/path ops per file**: Do as much in one pass as possible, and avoid unneeded splits and compositions.
- **Early return on missing safetensors**
- **Use local variable lookups**
- Some set operations are replaced with more efficient list comprehensions when possible (since order doesn't matter), and dicts are built with less branching.

Code comments are preserved where relevant.
---



**Summary of speedups:**
- Avoid repeated string splits or os.path funcs in hot file loops.
- Use local sets for faster lookup.
- Break early when possible and exploit the fact that file path structure is simple (at most one `/`).
- Switch to `.endswith` for extensions in tight loops, which is much faster than `os.path.splitext`.

You can expect this code to use much less time and memory in `is_safetensors_compatible` (the main hotspot) for large input lists.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jun 1, 2025
@codeflash-ai codeflash-ai bot requested a review from aseembits93 June 1, 2025 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants