Skip to content

Conversation

@aguest-kc
Copy link
Contributor

@aguest-kc aguest-kc commented Oct 31, 2025

Description:

Remove the Hadoop copy merge step from Spark downloads and use the correct number of partitions instead of 1.

Technical Details:

Updated the number of partitions to be total records / max records per file instead of 1. Skipped the Hadoop copy merge step since the previous change results in the correct number of records in each row. Added a method for renaming the part files to the expected format.

Requirements for PR Merge:

  1. Unit & integration tests updated
  2. Data validation completed (examples listed below)
    1. Does this work well with the current frontend? Or is the frontend aware of a needed change?
    2. Is performance impacted in the changes (e.g., API, pipeline, downloads, etc.)?
    3. Is the expected data returned with the expected format?
  3. Jira Ticket(s)
    1. DEV-12528

Explain N/A in above checklist:

  1. API documentation updated (examples listed below)
    No API contracts need to be updated for this change.

  2. Appropriate Operations ticket(s) created
    No operation tickets are needed for this change.

sethstoudenmier
sethstoudenmier previously approved these changes Nov 4, 2025
Copy link
Contributor

@sethstoudenmier sethstoudenmier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but left some comments are possible cleanup / improvements.

@aguest-kc aguest-kc merged commit 495bd5e into qat Nov 5, 2025
36 of 37 checks passed
@aguest-kc aguest-kc deleted the ftr/dev-12528-spark-download-zipping branch December 9, 2025 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants