Refactor: use unpacking in load.py for time and memory improvement #7750

brchristian · 2025-08-26T22:13:11Z

In src/datasets/load.py, we can use unpacking rather than concatenating two lists for improved time and memory performance. It’s a small improvement in absolute terms, but a consistent and measurable one:

- ALL_ALLOWED_EXTENSIONS = list(_EXTENSION_TO_MODULE.keys()) + [".zip"]
+ ALL_ALLOWED_EXTENSIONS = [*_EXTENSION_TO_MODULE.keys(), ".zip"]

Benchmarking shows approximately 32.3% time improvement and 30.6% memory improvement.

Example benchmarking script:

#!/usr/bin/env python3
"""
Benchmark script to test performance of list(_EXTENSION_TO_MODULE.keys()) vs [*_EXTENSION_TO_MODULE.keys()]
"""
import time
import tracemalloc
from statistics import mean, stdev

# Simulate _EXTENSION_TO_MODULE - based on actual size from datasets
_EXTENSION_TO_MODULE = {
    f".ext{i}": f"module{i}" for i in range(20)  # Realistic size
}

def method_old():
    """Current implementation using list()"""
    return list(_EXTENSION_TO_MODULE.keys()) + [".zip"]

def method_new():
    """Proposed implementation using unpacking"""
    return [*_EXTENSION_TO_MODULE.keys(), ".zip"]

def benchmark_time(func, iterations=100000):
    """Benchmark execution time"""
    times = []
    for _ in range(10):  # Multiple runs for accuracy
        start = time.perf_counter()
        for _ in range(iterations):
            func()
        end = time.perf_counter()
        times.append((end - start) / iterations * 1_000_000)  # microseconds
    
    return mean(times), stdev(times)

def benchmark_memory(func):
    """Benchmark peak memory usage"""
    tracemalloc.start()
    func()
    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    return peak

if __name__ == "__main__":
    print("Benchmarking list() vs unpacking performance...\n")
    
    # Time benchmarks
    old_time, old_std = benchmark_time(method_old)
    new_time, new_std = benchmark_time(method_new)
    
    print(f"Time Performance (µs per operation):")
    print(f"  list() approach:     {old_time:.3f} ± {old_std:.3f}")
    print(f"  unpacking approach:  {new_time:.3f} ± {new_std:.3f}")
    print(f"  Improvement:         {((old_time - new_time) / old_time * 100):.1f}% faster")
    
    # Memory benchmarks
    old_mem = benchmark_memory(method_old)
    new_mem = benchmark_memory(method_new)
    
    print(f"\nMemory Usage (bytes):")
    print(f"  list() approach:     {old_mem}")
    print(f"  unpacking approach:  {new_mem}")
    print(f"  Reduction:           {old_mem - new_mem} bytes ({((old_mem - new_mem) / old_mem * 100):.1f}% less)")
    
    # Verify identical results
    assert method_old() == method_new(), "Results should be identical!"
    print(f"\n✓ Both methods produce identical results")

Results:

Benchmarking list() vs unpacking performance...

Time Performance (µs per operation):
  list() approach:     0.213 ± 0.020
  unpacking approach:  0.144 ± 0.002
  Improvement:         32.3% faster

Memory Usage (bytes):
  list() approach:     392
  unpacking approach:  272
  Reduction:           120 bytes (30.6% less)

✓ Both methods produce identical results

Use unpacking rather than concatenation for improved time and memory performance.

Refactor ALL_ALLOWED_EXTENSIONS initialization

d2d3190

Use unpacking rather than concatenation for improved time and memory performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: use unpacking in load.py for time and memory improvement #7750

Refactor: use unpacking in load.py for time and memory improvement #7750

Uh oh!

brchristian commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Refactor: use unpacking in load.py for time and memory improvement #7750

Are you sure you want to change the base?

Refactor: use unpacking in load.py for time and memory improvement #7750

Uh oh!

Conversation

brchristian commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant