Skip to content

HPCC-35694 DFUQuery sort by compressed size#20885

Open
GordonSmith wants to merge 4 commits intohpcc-systems:candidate-10.2.xfrom
GordonSmith:HPCC-35694-DFUSORTBY_COMPRESSEDSIZE
Open

HPCC-35694 DFUQuery sort by compressed size#20885
GordonSmith wants to merge 4 commits intohpcc-systems:candidate-10.2.xfrom
GordonSmith:HPCC-35694-DFUSORTBY_COMPRESSEDSIZE

Conversation

@GordonSmith
Copy link
Member

@GordonSmith GordonSmith commented Jan 23, 2026

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copilot AI review requested due to automatic review settings January 23, 2026 13:06
@github-actions
Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-35694

Jirabot Action Result:
Assigning user: gordon.smith@lexisnexisrisk.com
Workflow Transition To: Merge Pending
Updated PR

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds sorting capability by compressed file size in the DFUQuery service. The change enables users to sort logical files by their compressed size, with appropriate fallback to uncompressed size for non-compressed files.

Changes:

  • Added frontend UI enhancements to display compressed size with visual indicators for compressed vs uncompressed files
  • Implemented backend sorting logic for compressed file size in the DFU service
  • Added mapping for compressed size field in the sort order configuration
  • Updated ESDL service interface documentation with version bumping guidelines

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
esp/src/src-react/components/Files.tsx Changed CompressedFileSize column to be sortable, added dimmed styling for uncompressed files showing their size in brackets
esp/services/ws_dfu/ws_dfuService.hpp Added declaration for findPositionByCompressedSize function
esp/services/ws_dfu/ws_dfuService.cpp Implemented findPositionByCompressedSize sorting function and added CompressedFileSize to legacy sort mappings
dali/base/dadfs.cpp Added logic to populate compressedsize field for file attributes, with fallback to origsize
.github/instructions/esp-service-interface.instructions.md Added version bumping guidelines section to ESDL instructions

@GordonSmith GordonSmith requested a review from jakesmith January 23, 2026 13:11
@GordonSmith
Copy link
Member Author

GordonSmith commented Jan 23, 2026

@jakesmith tagging you for review as I think you made the changes in and around DFUSFsize, I tried to make this a minimal change, but IMO really should be refactored to:

  • size
  • compressedSize
  • fileSize = isCompressed ? compressedSize : size

@GordonSmith GordonSmith force-pushed the HPCC-35694-DFUSORTBY_COMPRESSEDSIZE branch 2 times, most recently from 320b00c to c689e95 Compare January 23, 2026 14:22
@GordonSmith GordonSmith force-pushed the HPCC-35694-DFUSORTBY_COMPRESSEDSIZE branch from c689e95 to cd736df Compare January 23, 2026 14:37
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GordonSmith - had a looked into this/history of DFUQResultField::origsize etc. a bit.
See comment.

@GordonSmith GordonSmith marked this pull request as draft January 29, 2026 08:05
@GordonSmith GordonSmith force-pushed the HPCC-35694-DFUSORTBY_COMPRESSEDSIZE branch 2 times, most recently from 72afcb2 to e38741a Compare February 9, 2026 12:47
@GordonSmith GordonSmith requested a review from Copilot February 9, 2026 12:48
@GordonSmith GordonSmith marked this pull request as ready for review February 9, 2026 12:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

@GordonSmith GordonSmith force-pushed the HPCC-35694-DFUSORTBY_COMPRESSEDSIZE branch 2 times, most recently from 6296923 to e5d793e Compare February 12, 2026 09:04
Signed-off-by: Gordon Smith<GordonJSmith@gmail.com>
Signed-off-by: Gordon Smith<GordonJSmith@gmail.com>
@GordonSmith GordonSmith force-pushed the HPCC-35694-DFUSORTBY_COMPRESSEDSIZE branch from e5d793e to 9b85d09 Compare February 16, 2026 12:53
@GordonSmith
Copy link
Member Author

rebased

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

if (options.includeField(DFUQResultField::origsize))
{
const char *propName = getDFUQResultFieldName(DFUQResultField::origsize);
attr->setPropInt64(propName, attr->getPropInt64(getDFUQResultFieldName(DFUQResultField::origsize), -1));
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block sets @origsize to the value of @origsize, which is effectively a no-op when present, but will also create @origsize = -1 when absent. If the goal is to ensure the field exists only when already populated, consider guarding with hasProp(propName) (or omit this block entirely). If the goal is to force materialization, add a clarifying comment and avoid emitting -1 as a sentinel that could leak into clients.

Suggested change
attr->setPropInt64(propName, attr->getPropInt64(getDFUQResultFieldName(DFUQResultField::origsize), -1));
if (attr->hasProp(propName))
attr->setPropInt64(propName, attr->getPropInt64(propName));

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree I suppose, but not quite what copilot is suggesting. The code is trying to default this prop to -1 if not set, so it would be better to do:

    if (!attr->hasProp(propName))
        attr->setPropInt64(propName, -1);

But, re. prev comments that were on lines 14206-14208..
and comment at end of line on 14210 "//Sort the files with empty size to front"
Aren't blank entries already at front?

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GordonSmith - please see comments/questions.

// Size field is notionally the size on disk
const char *propName = getDFUQResultFieldName(DFUQResultField::size);
attr->setPropInt64(propName, attr->getPropInt64(getDFUQResultFieldName(DFUQResultField::origsize), -1));//Sort the files with empty size to front
attr->setPropInt64(propName, isCompressed(*attr) ? attr->getPropInt64(getDFUQResultFieldName(DFUQResultField::compressedsize), -1) : attr->getPropInt64(getDFUQResultFieldName(DFUQResultField::origsize), -1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DFUQResultField::size is a special field that only this sets (@DFUSFsize)

@GordonSmith - as far I could see last time I looked, Eclwatch (nor nothing else uses it).. I thought Eclwatch used "@SiZe" which is DFUQResultField::origsize .. ? (see comment in code here prior to this change).

If so this code should be deleted (and probably DFUQResultField::size/@DFUSFsize, but that could be part of a separate cleanup PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate on what Jake's saying here as best as I can see, the @DFUSFsize value is not exposed in the DFUQuery response. The legacyMappings are used to map name mismatches between the SortBy strings submitted in the request and the PTree attributes returned from CDistributedFileDirectory::getLogicalFiles (which uses deserializeFileAttrIter(...) if I'm correct to create the returned PTree).

The missing link is that the WsDFUHelpers::addToLogicalFileList(...) function isn't pulling out the @DFUSFsize attribute from the response. Even if it were, I'm not entirely sure where it should be returned.

Currently it pulls @size aka DFUQResultField::origsize to populate IntSize and TotalSize response fields. And it pulls @compressedSize to populate the CompressedFileSize response field.

The DFUQuery response would probably need to be modified with a new field for something like FileSize to hold @DFUSFsize if you wanted it:

https://github.com/hpcc-systems/HPCC-Platform/blob/master/esp/scm/ws_dfu_common.ecm

if (options.includeField(DFUQResultField::origsize))
{
const char *propName = getDFUQResultFieldName(DFUQResultField::origsize);
attr->setPropInt64(propName, attr->getPropInt64(getDFUQResultFieldName(DFUQResultField::origsize), -1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree I suppose, but not quite what copilot is suggesting. The code is trying to default this prop to -1 if not set, so it would be better to do:

    if (!attr->hasProp(propName))
        attr->setPropInt64(propName, -1);

But, re. prev comments that were on lines 14206-14208..
and comment at end of line on 14210 "//Sort the files with empty size to front"
Aren't blank entries already at front?

CPPUNIT_ASSERT_MESSAGE("testGetLogicalFilesSorted: Missing cost attributes", costAttrsPresent);
bool dirAttrsPresent = attrs.hasProp("@directory");
CPPUNIT_ASSERT_MESSAGE("testGetLogicalFilesSorted: directory attribute should NOT be present", !dirAttrsPresent);
CPPUNIT_ASSERT_MESSAGE("testGetLogicalFilesSorted: size field missing", attrs.hasProp("@DFUSFsize"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why we have @DFUSFsize at the moment - what if anything uses it?

@GordonSmith GordonSmith requested a review from asselitx February 26, 2026 09:29
// Size field is notionally the size on disk
const char *propName = getDFUQResultFieldName(DFUQResultField::size);
attr->setPropInt64(propName, attr->getPropInt64(getDFUQResultFieldName(DFUQResultField::origsize), -1));//Sort the files with empty size to front
attr->setPropInt64(propName, isCompressed(*attr) ? attr->getPropInt64(getDFUQResultFieldName(DFUQResultField::compressedsize), -1) : attr->getPropInt64(getDFUQResultFieldName(DFUQResultField::origsize), -1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate on what Jake's saying here as best as I can see, the @DFUSFsize value is not exposed in the DFUQuery response. The legacyMappings are used to map name mismatches between the SortBy strings submitted in the request and the PTree attributes returned from CDistributedFileDirectory::getLogicalFiles (which uses deserializeFileAttrIter(...) if I'm correct to create the returned PTree).

The missing link is that the WsDFUHelpers::addToLogicalFileList(...) function isn't pulling out the @DFUSFsize attribute from the response. Even if it were, I'm not entirely sure where it should be returned.

Currently it pulls @size aka DFUQResultField::origsize to populate IntSize and TotalSize response fields. And it pulls @compressedSize to populate the CompressedFileSize response field.

The DFUQuery response would probably need to be modified with a new field for something like FileSize to hold @DFUSFsize if you wanted it:

https://github.com/hpcc-systems/HPCC-Platform/blob/master/esp/scm/ws_dfu_common.ecm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants