Skip to content

Single-entity export lacks page-level parallelism #503

@joshsmithxrm

Description

@joshsmithxrm

Problem

ParallelExporter parallelizes across entities, not within a single entity. For single-entity exports (or multi-entity exports with one large table), it's purely sequential paging:

// src/PPDS.Migration/Export/ParallelExporter.cs:118
// Parallelism is at entity level only
await Parallel.ForEachAsync(schema.Entities, ...)

Each entity is exported with sequential paging:

Page 1 (5000) → wait → Page 2 (5000) → wait → ... → Page N → done

Performance Comparison

Tool Time for 269K records Throughput Method
SQL4CDS 26-36 sec ~8,000 rec/s 48-thread partitioned
PPDS 2:43 (163 sec) ~1,650 rec/s Sequential paging

PPDS is 5x slower than SQL4CDS for single-entity exports.

Root Cause

FetchXML paging requires cookies from previous pages, preventing simple parallelization of pages. But SQL4CDS demonstrates alternatives exist.

Possible Solutions

1. Range-based partitioning (like SQL4CDS)

Split by primary key ranges and fetch in parallel:

var ranges = await GetPrimaryKeyRanges(entityName, partitionCount);
await Parallel.ForEachAsync(ranges, async range => {
    var fetchXml = BuildFetchXmlWithKeyRange(entity, range.Min, range.Max);
    // Fetch all records in this range with sequential paging
});

2. Offset paging

Get total count first, then parallel fetch by page offset:

var total = await GetTotalCount(entity);
var pageCount = (total + pageSize - 1) / pageSize;
await Parallel.ForEachAsync(Enumerable.Range(0, pageCount), async pageNum => {
    var fetchXml = BuildFetchXmlWithOffset(entity, pageNum * pageSize, pageSize);
});

3. Hybrid approach

Use sequential paging for small exports (<10K records) and parallel partitioning for large exports.

Files

  • src/PPDS.Migration/Export/ParallelExporter.cs
  • src/PPDS.Migration/Export/ExportOptions.cs (add partition settings)

Acceptance Criteria

  • Single-entity export of 269K records should complete in <1 minute (vs current 2:43)
  • Throughput should be >5,000 rec/s (vs current ~1,650 rec/s)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions