Skip to content

Performance: Parallelize data source fetching with structured concurrency #26

@leogdion

Description

@leogdion

Problem

DataSourcePipeline fetches data sources sequentially (lines 50-71), causing total sync time to equal the sum of all fetch times.

Current implementation:

do {
    restoreImages = try await fetchRestoreImages(options: options)
} catch { throw error }
do {
    xcodeVersions = try await fetchXcodeVersions(options: options)
} catch { throw error }

Proposed Solution

Use structured concurrency to fetch independent sources in parallel:

async let restoreImages = fetchRestoreImages(options: options)
async let xcodeVersions = fetchXcodeVersions(options: options)
async let swiftVersions = fetchSwiftVersions(options: options)

let (images, xcode, swift) = try await (restoreImages, xcodeVersions, swiftVersions)

Additionally:

  • Add size limits for in-memory deduplication to prevent excessive memory usage
  • Consider streaming/chunking for large datasets

Impact

Performance: Reduce sync time from sum(all sources) to max(source time)

Example: If 3 sources take 5s, 8s, and 3s:

  • Current: 16 seconds total
  • Parallel: 8 seconds total (50% faster)

Files Affected

  • Examples/Bushel/Sources/BushelImages/DataSources/DataSourcePipeline.swift:50-71
  • Deduplication methods (potential memory issues)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions