Skip to content

Export OOM reduce the surrogateIdRangeSize and increase range count#5540

Draft
apurvabhaleMS wants to merge 6 commits intomainfrom
personal/abhale/export-oom-lists
Draft

Export OOM reduce the surrogateIdRangeSize and increase range count#5540
apurvabhaleMS wants to merge 6 commits intomainfrom
personal/abhale/export-oom-lists

Conversation

@apurvabhaleMS
Copy link
Copy Markdown
Contributor

Description

Describe the changes in this PR.

Related issues

Addresses [issue #].

Testing

Describe how this change was tested.

FHIR Team Checklist

  • Update the title of the PR to be succinct and less than 65 characters
  • Add a milestone to the PR for the sprint that it is merged (i.e. add S47)
  • Tag the PR with the type of update: Bug, Build, Dependencies, Enhancement, New-Feature or Documentation
  • Tag the PR with Open source, Azure API for FHIR (CosmosDB or common code) or Azure Healthcare APIs (SQL or common code) to specify where this change is intended to be released.
  • Tag the PR with Schema Version backward compatible or Schema Version backward incompatible or Schema Version unchanged if this adds or updates Sql script which is/is not backward compatible with the code.
  • When changing or adding behavior, if your code modifies the system design or changes design assumptions, please create and include an ADR.
  • CI is green before merge Build Status
  • Review squash-merge requirements

Semver Change (docs)

Patch|Skip|Feature|Breaking (reason)

Assert.Equal(OperationStatus.Completed, _exportJobRecord.Status);

// Verify all 6 resources were written (2 per range * 3 ranges)
Assert.True(_exportJobRecord.Output.ContainsKey(KnownResourceTypes.Patient));
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.06%. Comparing base (a894a3b) to head (3598d0e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5540      +/-   ##
==========================================
+ Coverage   77.02%   77.06%   +0.03%     
==========================================
  Files         983      984       +1     
  Lines       36007    36110     +103     
  Branches     5469     5484      +15     
==========================================
+ Hits        27736    27828      +92     
- Misses       6927     6932       +5     
- Partials     1344     1350       +6     

see 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR aims to reduce SQL export out-of-memory risk by splitting each export job’s surrogate ID range into many smaller sub-ranges, then processing those sub-ranges sequentially within a single processing job.

Changes:

  • Split orchestrated SQL export ranges into smaller sub-ranges and attach them to ExportJobRecord.
  • Update export execution to iterate sub-ranges sequentially, defer final file commit, and add memory diagnostics.
  • Add supporting model/config changes and happy-path unit tests for sub-range handling.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/Microsoft.Health.Fhir.SqlServer/Features/Operations/Export/SqlExportOrchestratorJob.cs Splits larger export ranges into smaller batches and queues grouped processing records.
src/Microsoft.Health.Fhir.Core/Features/Operations/JobRecordProperties.cs Adds serialized property name for sub-range metadata.
src/Microsoft.Health.Fhir.Core/Features/Operations/Export/Models/ExportSurrogateIdRange.cs Introduces the serialized start/end range model.
src/Microsoft.Health.Fhir.Core/Features/Operations/Export/Models/ExportJobRecord.cs Stores sub-range lists on export job records.
src/Microsoft.Health.Fhir.Core/Features/Operations/Export/ExportProcessingJob.cs Refactors execution/result handling around export task invocation.
src/Microsoft.Health.Fhir.Core/Features/Operations/Export/ExportJobTask.cs Processes multiple sub-ranges sequentially and changes commit/logging behavior.
src/Microsoft.Health.Fhir.Core/Configs/ExportJobConfiguration.cs Adds the shared range-reduction constant.
src/Microsoft.Health.Fhir.Core.UnitTests/Features/Operations/Export/ExportProcessingJobTests.cs Adds tests for passing sub-ranges through processing jobs.
src/Microsoft.Health.Fhir.Core.UnitTests/Features/Operations/Export/ExportJobTaskTests.cs Adds happy-path tests for executing sub-ranges in one task.

}

ExportJobProgress progress = _exportJobRecord.Progress;
if (_exportJobRecord.SurrogateIdRanges != null && _exportJobRecord.SurrogateIdRanges.Count > 0)
// Override _count per sub-range to match the range size, ensuring SQL returns at most
// that many resources per call. This prevents OOM for large resources by capping
// how many rows are materialized in memory at once.
for (int i = 0; i < _exportJobRecord.SurrogateIdRanges.Count; i++)
rangeQueryParams.Add(Tuple.Create(KnownQueryParameterNames.EndSurrogateId, range.EndId));
rangeQueryParams.Add(Tuple.Create(KnownQueryParameterNames.StartSurrogateId, range.StartId));

_exportJobRecord.Progress = new ExportJobProgress(continuationToken: null, page: 0);
// how many rows are materialized in memory at once.
for (int i = 0; i < _exportJobRecord.SurrogateIdRanges.Count; i++)
{
var range = _exportJobRecord.SurrogateIdRanges[i];
Comment on lines +250 to +251
rangeQueryParams.Add(Tuple.Create(KnownQueryParameterNames.EndSurrogateId, range.EndId));
rangeQueryParams.Add(Tuple.Create(KnownQueryParameterNames.StartSurrogateId, range.StartId));
Comment on lines +253 to +254
_exportJobRecord.Progress = new ExportJobProgress(continuationToken: null, page: 0);
ExportJobProgress rangeProgress = _exportJobRecord.Progress;
Comment on lines +237 to +248
var rangeQueryParams = new List<Tuple<string, string>>(queryParametersList);

// Replace the _count parameter with the per-range value
for (int p = 0; p < rangeQueryParams.Count; p++)
{
if (rangeQueryParams[p].Item1 == KnownQueryParameterNames.Count)
{
rangeQueryParams[p] = Tuple.Create(KnownQueryParameterNames.Count, rangeCount.ToString(CultureInfo.InvariantCulture));
break;
}
}

Comment on lines +108 to +109
var rangeBatches = new List<List<(long StartId, long EndId, int Count)>>();
var currentBatch = new List<(long StartId, long EndId, int Count)>();
Comment on lines +117 to +122
currentBatch.Add(range);
if (currentBatch.Count >= ExportJobConfiguration.RangeReductionFactor)
{
rangeBatches.Add(currentBatch);
currentBatch = new List<(long StartId, long EndId, int Count)>();
}
Comment on lines +130 to +133
foreach (var batch in rangeBatches)
{
_logger.LogJobInformation(jobInfo, "Creating export record (1).");
var processingRecord = CreateExportRecord(record, jobInfo.GroupId, resourceType: type, startSurrogateId: range.StartId.ToString(), endSurrogateId: range.EndId.ToString(), globalStartSurrogateId: globalStartId.ToString(), globalEndSurrogateId: globalEndId.ToString());
var processingRecord = CreateExportRecord(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants