Skip to content

HPCC-35248 add scope filters to XRef#20609

Merged
ghalliday merged 1 commit intohpcc-systems:candidate-10.2.xfrom
jackdelv:issue35248
Feb 27, 2026
Merged

HPCC-35248 add scope filters to XRef#20609
ghalliday merged 1 commit intohpcc-systems:candidate-10.2.xfrom
jackdelv:issue35248

Conversation

@jackdelv
Copy link
Contributor

@jackdelv jackdelv commented Nov 6, 2025

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copilot AI review requested due to automatic review settings November 6, 2025 16:23
@github-actions
Copy link

github-actions bot commented Nov 6, 2025

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-35248

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a scope filtering feature to the XRef (cross-reference) build functionality. Users can now specify scope filters (e.g., "scope1::scope2") when generating XRef reports to limit the scan to specific file scopes, reducing processing time and resource usage for targeted XRef builds.

Key Changes

  • Adds FilterScopes parameter to the DFUXRefBuild API request and plumbs it through the entire stack from the web UI to the backend XRef processing
  • Implements directory and file filtering logic in the XRef scanner to skip directories and files outside the specified scope patterns
  • Adds UI input field with local storage persistence for the filter scopes setting

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
esp/scm/ws_dfuXref.ecm Adds FilterScopes string parameter to DFUXRefBuildRequest schema
esp/services/ws_dfu/ws_dfuXRefService.cpp Retrieves FilterScopes from request and passes it to XRef node
dali/dfuXRefLib/XRefNodeManager.hpp Declares getFilterScopes/setFilterScopes methods in IXRefNode interface
dali/dfuXRefLib/XRefNodeManager.cpp Implements FilterScopes getter/setter to persist the value in the XRef tree
dali/sasha/sacmd.hpp Declares queryFilterScopes/setFilterScopes methods in ISashaCommand interface
dali/sasha/sacmd.cpp Implements FilterScopes member and serialization in CSashaCommand
dali/sasha/saxref.cpp Core implementation: parses scope filters using "::" delimiter, implements directory scanning with scope-based filtering, and filters logical files from Dali based on scope patterns
system/jlib/jutil.cpp Modifies DelimToStringArray to track multi-character delimiter length and advance pointer correctly after matching
esp/src/src-react/components/Xrefs.tsx Adds TextField for filter scopes input with localStorage persistence and passes value to DFUXRefBuild API
esp/src/src-dojo/nls/hpcc.ts Adds localized placeholder text for the filter scopes input field

Comment on lines +170 to +175
<TextField
styles={{ root: { width: 250, marginLeft: 8, marginRight: 8, marginTop: 6 } }}
placeholder={nlsHPCC.ExampleScopePlaceholder}
value={filterScopes}
onChange={(_, newValue) => handleFilterScopesChange(newValue || "")}
/>
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TextField component is missing an accessible label. While a placeholder is provided, placeholders are not sufficient for accessibility as they disappear when text is entered and are not reliably announced by screen readers.

Add a label prop to the TextField to provide a persistent, accessible label. For example:

<TextField
    label={nlsHPCC.FilterScopes} // or similar localized string
    styles={{ root: { width: 250, marginLeft: 8, marginRight: 8, marginTop: 6 } }}
    placeholder={nlsHPCC.ExampleScopePlaceholder}
    value={filterScopes}
    onChange={(_, newValue) => handleFilterScopesChange(newValue || "")}
/>

Copilot uses AI. Check for mistakes.
Comment on lines +2952 to +2955
// Get filterScopes for this cluster
const char *scopes = cluster.queryProp("@filterScopes");
if (scopes && *scopes)
filterScopes.set(scopes);
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When multiple clusters are submitted with different filterScopes values, only the last cluster's filterScopes will be retained because filterScopes.set(scopes) overwrites the previous value instead of handling multiple clusters appropriately.

This could be problematic if the system supports building XRef for multiple clusters simultaneously with different scope filters. Consider whether this is the intended behavior, and if not, implement proper handling for multiple clusters (e.g., using a map of cluster to filterScopes, or documenting that only single-cluster submissions support filterScopes).

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with copilot.
If only 1 is support for now, should at least be an error/exception here.

@jackdelv jackdelv requested a review from jakesmith November 6, 2025 16:56
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - looks good overall, see comments.

mb.read(state);
mb.read(owner);
mb.read(cluster);
mb.read(filterScopes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will break backward compatibility..
You can see [horrible] code below which is looking for newer field values on the end of the buffer .. it would have go there if anywhere. I hate how inflexible it is.

It's tempting to change the protocol (e.g. to a json or serialized IPT), but this [server] would have to cope with the current legacy format, and newer clients would have to check if server could cope with new format somehow, e.g. maybe in accept if message is blank, send back version #.

This should be a separate self-contained task though, and prob. OTT for this, but you will need to ensure backward compat. with this change.

Comment on lines +2952 to +2955
// Get filterScopes for this cluster
const char *scopes = cluster.queryProp("@filterScopes");
if (scopes && *scopes)
filterScopes.set(scopes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with copilot.
If only 1 is support for now, should at least be an error/exception here.

@jackdelv
Copy link
Contributor Author

jackdelv commented Dec 9, 2025

@jakesmith I refactored the logic as it was too confusing and didn't support multiple filter scopes. I added support for multiple filters and added a function to check the scope against any filters. I think this is much clearer what is happening although it is less efficient. I didn't think efficiency was as important since having scope filters turned on greatly reduces the amount of data that is processed (Makes a cluster that usually takes ~40mins to run finish almost instantly).

@jackdelv jackdelv requested a review from jakesmith December 9, 2025 20:33
@jackdelv jackdelv added Feature Notable Change A notable change that is likely to be worth highlighting in any version that is released. labels Jan 15, 2026
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - please see comments and questions.

mb.read(beforeWU);
}
if (mb.remaining() > 0)
mb.read(filterScopes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this approach looks like a problem if afterWU/beforeWU aren't set.
The serialize code:

        mb.append(sortDescending);
        if (afterWU.get() || beforeWU.get()) {
            mb.append(afterWU);
            mb.append(beforeWU);
        }
        mb.append(filterScopes);

so if afterWU/beforeWU are empty, they won't be serialized, but filterScopes will be, which is going to cause the deserialize code to break (i.e. read filterScopes as afterWU).

In testing, was afterWU/beforeWU always set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in my testing afterWU/beforeWU were always set. I'm not sure where they come from though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see CSashaCommnd::serialize - afterWU and beforeWU are only serialized if set, and they're not necessarily set..

so the new unconditional serialization of filterScopes is going to be seen as read as the beforeWU if beforeWU wasn't set and break things.

So this needs more care.

I think:

  • change the serialization version # (1 -> 2).
  • change deserialization code to only deserialize new info if version >= 2
  • change to unconditionally send afterWU/beforeWU (check it's okay that blank, I think it is)
  • up SASHAVERSION

This helps but is still flawed, because an a new saserver may send a command with filterScopes to an old one, but if new one unconditional sends afterWU/beforeWU it should be 'ok' as far as that piece of code is concerned.

This is all horrible, it would be far better if it serialized a IPT so didn't have these flaw remaining() branches approach, but would still have the worry re. compat. with older servers.. not worth it at this juncture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I upped the serialization version and SASHAVERSION. I changed it to unconditionally serialize/deserialze the afterWU/beforeWU and filterScopes values in the new version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change doesn't look quite right
It will now never deserialize afterWU and beforeWU if version=1, so a newer command sent with before/after will be ignored.
There should be an else, which falls back to previous logic checking remaining()

StringBuffer fname;
offset_t nsz = 0;
StringArray dirs;
bool scanFiles = fullScopeMatchesFilter(path.str()+filePathOffset+1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this approach mean that as it's recursiing down, it's repeatedly, reparsing and rechecking all path segments, when to be here, we know they must have matched? (same true of partialScopeMatchesFilter)

I think if you pre-split the filter scope, then you should only need to compare the current directory path/scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as it recurses down it repeatedly reparses the path. This is because there are multiple filters supported. If we just check the filter at the current scope, then we could match a filter at that level but not have matched it previously.

Copy link
Member

@jakesmith jakesmith Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - let's discuss offline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is a quite a big waste, maybe not that impactful to speed, but the deeper it goes, the more it's wasting time scanning over filters that it should know match already.
But perhaps return to this after this PR to improve.

@jackdelv jackdelv changed the base branch from candidate-9.12.x to candidate-10.2.x February 12, 2026 14:41
@jackdelv jackdelv requested a review from jakesmith February 12, 2026 21:09
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - please see comments, I think there are a few more issues.

A [small] unittest may be warranted that could have exposed some of these.


virtual StringBuffer& getCluster(StringBuffer& Cluster);
virtual void setCluster(const char* str);
virtual StringBuffer& getFilterScopes(StringBuffer& filterScopes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this other IXRefNode impls. should have 'override' really.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added override to IXRefNode methods

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing virtuals to accompany them

mb.read(beforeWU);
}
if (mb.remaining() > 0)
mb.read(filterScopes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see CSashaCommnd::serialize - afterWU and beforeWU are only serialized if set, and they're not necessarily set..

so the new unconditional serialization of filterScopes is going to be seen as read as the beforeWU if beforeWU wasn't set and break things.

So this needs more care.

I think:

  • change the serialization version # (1 -> 2).
  • change deserialization code to only deserialize new info if version >= 2
  • change to unconditionally send afterWU/beforeWU (check it's okay that blank, I think it is)
  • up SASHAVERSION

This helps but is still flawed, because an a new saserver may send a command with filterScopes to an old one, but if new one unconditional sends afterWU/beforeWU it should be 'ok' as far as that piece of code is concerned.

This is all horrible, it would be far better if it serialized a IPT so didn't have these flaw remaining() branches approach, but would still have the worry re. compat. with older servers.. not worth it at this juncture.

{
if (fileScopeEnd == fileScope)
{
for (;i<numItemsi;i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the functionality of this block is confusing... (use of hidden 'numItemsi' .. is confusing too, thought wouldn't compile at 1st glance).

What is the logic here..
It looks like it's saying, if wound back through lfn scopes and no matches, then match (return false) if any are empty.. but if that was wanted.. then it means all files match if any are blank, and why bother checking any?
and, why are there any empty scopeFilters in the 1st place?

again could do with comments in code to show intent if not obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The while loop at the beginning removes the filename to get just the scope. I moved this out of the for loop since the logical filename won't change. I simplified the code when no scope separator is found to use forEachItemIn instead of reusing the hidden variable.

The reason I left in blank filters was because I wasn't sure how to allow files found in the root directory that have no scope. Are they always filtered/not filtered or should there be a way to specify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I left in blank filters was because I wasn't sure how to allow files found in the root directory that have no scope. Are they always filtered/not filtered or should there be a way to specify?

I'm not sure I would have bothered allowing a filter that included top-scope. I don't imagine it will ever be used.
However, I think if wasn't special cases, it would have been less of a source of confusion.
Really whether it's a scope filter of "x", or "", it is still the same semantics, match the scope of the file, e.g. does "x" match the leading scopes of "x::file", or does "" match the leading scope of file "file2". In both cases they do without special casing.

i.e. I think you can simplify by removing the special case.
The '// Check if scope matches any filter' already copes with a zero length scope, i.e. strncmp of length 0 is always 0, so I think loop could be just:

                // Get pointer to end of scope in logical file name
                while (fileScopeEnd != fileScope)
                {
                    if (fileScopeEnd[0] == ':' && fileScopeEnd[1] == ':')
                        break;
                    fileScopeEnd--;
                }

but probably just as efficient if not more (and arguably functionally clearer), to use std::string_view/rfind:

                std::string_view sv(fileScope, logicalFileName.length());
                size_t pos = sv.rfind("::");
                size_t prefixLength = (pos != std::string_view::npos) ? pos : logicalFileName.length();

@jackdelv jackdelv requested a review from jakesmith February 23, 2026 14:55
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - please see comments.

mb.read(beforeWU);
}
if (mb.remaining() > 0)
mb.read(filterScopes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change doesn't look quite right
It will now never deserialize afterWU and beforeWU if version=1, so a newer command sent with before/after will be ignored.
There should be an else, which falls back to previous logic checking remaining()

dts[i].serialize(mb);

mb.append(sortDescending);
if (afterWU.get() || beforeWU.get()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine, uncondtionally sending, the old logic dealt with that, but see comment re. deserialize side.

void setXRefData(IPropertyTree & pTree);

void BuildXRefData(IPropertyTree & pTree,const char* Cluster);
void BuildXRefData(IPropertyTree & pTree,const char* Cluster) override;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know not new, but as adding 'override' please always also add 'virtual'


virtual StringBuffer& getCluster(StringBuffer& Cluster);
virtual void setCluster(const char* str);
virtual StringBuffer& getFilterScopes(StringBuffer& filterScopes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing virtuals to accompany them

{
if (fileScopeEnd == fileScope)
{
for (;i<numItemsi;i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I left in blank filters was because I wasn't sure how to allow files found in the root directory that have no scope. Are they always filtered/not filtered or should there be a way to specify?

I'm not sure I would have bothered allowing a filter that included top-scope. I don't imagine it will ever be used.
However, I think if wasn't special cases, it would have been less of a source of confusion.
Really whether it's a scope filter of "x", or "", it is still the same semantics, match the scope of the file, e.g. does "x" match the leading scopes of "x::file", or does "" match the leading scope of file "file2". In both cases they do without special casing.

i.e. I think you can simplify by removing the special case.
The '// Check if scope matches any filter' already copes with a zero length scope, i.e. strncmp of length 0 is always 0, so I think loop could be just:

                // Get pointer to end of scope in logical file name
                while (fileScopeEnd != fileScope)
                {
                    if (fileScopeEnd[0] == ':' && fileScopeEnd[1] == ':')
                        break;
                    fileScopeEnd--;
                }

but probably just as efficient if not more (and arguably functionally clearer), to use std::string_view/rfind:

                std::string_view sv(fileScope, logicalFileName.length());
                size_t pos = sv.rfind("::");
                size_t prefixLength = (pos != std::string_view::npos) ? pos : logicalFileName.length();

@jackdelv jackdelv requested a review from jakesmith February 23, 2026 21:36
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - final 2 trivial comments.
Please look at and squash, and I'll give it one more look over.

mb.read(beforeWU);
mb.read(filterScopes);
}
else if (version == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conditional on version == 1 here is ok, but it was not conditional prior to this PR at all (only on mb.reamaining()), so strictly speaknig it would be more correct to not make it dependent on version == 1 now.
Admittedly version=1 dates back to ancient history so it is highly unlikely there is a version 0 out there...

{
unsigned filterLen = strlen(parent.scopeFilters.item(i));
if (filterLen != (unsigned)(fileScopeEnd - fileScope))
if (filterLen != (unsigned)prefixLength)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: size_t is the natural type here. strlen returns a size_t, and strncmp takes a size_t

@jackdelv
Copy link
Contributor Author

@jakesmith Added fixes from your comments and squashed.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - looks good.

@ghalliday ghalliday merged commit 86cddeb into hpcc-systems:candidate-10.2.x Feb 27, 2026
53 of 54 checks passed
@github-actions
Copy link

Jirabot Action Result:
Added fix version: 10.2.14
Workflow Transition: 'Resolve issue'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature Notable Change A notable change that is likely to be worth highlighting in any version that is released.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants