HPCC-35248 add scope filters to XRef#20609
HPCC-35248 add scope filters to XRef#20609ghalliday merged 1 commit intohpcc-systems:candidate-10.2.xfrom
Conversation
|
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-35248 Jirabot Action Result: |
There was a problem hiding this comment.
Pull Request Overview
This PR adds a scope filtering feature to the XRef (cross-reference) build functionality. Users can now specify scope filters (e.g., "scope1::scope2") when generating XRef reports to limit the scan to specific file scopes, reducing processing time and resource usage for targeted XRef builds.
Key Changes
- Adds
FilterScopesparameter to the DFUXRefBuild API request and plumbs it through the entire stack from the web UI to the backend XRef processing - Implements directory and file filtering logic in the XRef scanner to skip directories and files outside the specified scope patterns
- Adds UI input field with local storage persistence for the filter scopes setting
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| esp/scm/ws_dfuXref.ecm | Adds FilterScopes string parameter to DFUXRefBuildRequest schema |
| esp/services/ws_dfu/ws_dfuXRefService.cpp | Retrieves FilterScopes from request and passes it to XRef node |
| dali/dfuXRefLib/XRefNodeManager.hpp | Declares getFilterScopes/setFilterScopes methods in IXRefNode interface |
| dali/dfuXRefLib/XRefNodeManager.cpp | Implements FilterScopes getter/setter to persist the value in the XRef tree |
| dali/sasha/sacmd.hpp | Declares queryFilterScopes/setFilterScopes methods in ISashaCommand interface |
| dali/sasha/sacmd.cpp | Implements FilterScopes member and serialization in CSashaCommand |
| dali/sasha/saxref.cpp | Core implementation: parses scope filters using "::" delimiter, implements directory scanning with scope-based filtering, and filters logical files from Dali based on scope patterns |
| system/jlib/jutil.cpp | Modifies DelimToStringArray to track multi-character delimiter length and advance pointer correctly after matching |
| esp/src/src-react/components/Xrefs.tsx | Adds TextField for filter scopes input with localStorage persistence and passes value to DFUXRefBuild API |
| esp/src/src-dojo/nls/hpcc.ts | Adds localized placeholder text for the filter scopes input field |
| <TextField | ||
| styles={{ root: { width: 250, marginLeft: 8, marginRight: 8, marginTop: 6 } }} | ||
| placeholder={nlsHPCC.ExampleScopePlaceholder} | ||
| value={filterScopes} | ||
| onChange={(_, newValue) => handleFilterScopesChange(newValue || "")} | ||
| /> |
There was a problem hiding this comment.
The TextField component is missing an accessible label. While a placeholder is provided, placeholders are not sufficient for accessibility as they disappear when text is entered and are not reliably announced by screen readers.
Add a label prop to the TextField to provide a persistent, accessible label. For example:
<TextField
label={nlsHPCC.FilterScopes} // or similar localized string
styles={{ root: { width: 250, marginLeft: 8, marginRight: 8, marginTop: 6 } }}
placeholder={nlsHPCC.ExampleScopePlaceholder}
value={filterScopes}
onChange={(_, newValue) => handleFilterScopesChange(newValue || "")}
/>
dali/sasha/saxref.cpp
Outdated
| // Get filterScopes for this cluster | ||
| const char *scopes = cluster.queryProp("@filterScopes"); | ||
| if (scopes && *scopes) | ||
| filterScopes.set(scopes); |
There was a problem hiding this comment.
When multiple clusters are submitted with different filterScopes values, only the last cluster's filterScopes will be retained because filterScopes.set(scopes) overwrites the previous value instead of handling multiple clusters appropriately.
This could be problematic if the system supports building XRef for multiple clusters simultaneously with different scope filters. Consider whether this is the intended behavior, and if not, implement proper handling for multiple clusters (e.g., using a map of cluster to filterScopes, or documenting that only single-cluster submissions support filterScopes).
There was a problem hiding this comment.
I agree with copilot.
If only 1 is support for now, should at least be an error/exception here.
dali/sasha/sacmd.cpp
Outdated
| mb.read(state); | ||
| mb.read(owner); | ||
| mb.read(cluster); | ||
| mb.read(filterScopes); |
There was a problem hiding this comment.
this will break backward compatibility..
You can see [horrible] code below which is looking for newer field values on the end of the buffer .. it would have go there if anywhere. I hate how inflexible it is.
It's tempting to change the protocol (e.g. to a json or serialized IPT), but this [server] would have to cope with the current legacy format, and newer clients would have to check if server could cope with new format somehow, e.g. maybe in accept if message is blank, send back version #.
This should be a separate self-contained task though, and prob. OTT for this, but you will need to ensure backward compat. with this change.
dali/sasha/saxref.cpp
Outdated
| // Get filterScopes for this cluster | ||
| const char *scopes = cluster.queryProp("@filterScopes"); | ||
| if (scopes && *scopes) | ||
| filterScopes.set(scopes); |
There was a problem hiding this comment.
I agree with copilot.
If only 1 is support for now, should at least be an error/exception here.
|
@jakesmith I refactored the logic as it was too confusing and didn't support multiple filter scopes. I added support for multiple filters and added a function to check the scope against any filters. I think this is much clearer what is happening although it is less efficient. I didn't think efficiency was as important since having scope filters turned on greatly reduces the amount of data that is processed (Makes a cluster that usually takes ~40mins to run finish almost instantly). |
| mb.read(beforeWU); | ||
| } | ||
| if (mb.remaining() > 0) | ||
| mb.read(filterScopes); |
There was a problem hiding this comment.
this approach looks like a problem if afterWU/beforeWU aren't set.
The serialize code:
mb.append(sortDescending);
if (afterWU.get() || beforeWU.get()) {
mb.append(afterWU);
mb.append(beforeWU);
}
mb.append(filterScopes);
so if afterWU/beforeWU are empty, they won't be serialized, but filterScopes will be, which is going to cause the deserialize code to break (i.e. read filterScopes as afterWU).
In testing, was afterWU/beforeWU always set?
There was a problem hiding this comment.
Yes, in my testing afterWU/beforeWU were always set. I'm not sure where they come from though.
There was a problem hiding this comment.
see CSashaCommnd::serialize - afterWU and beforeWU are only serialized if set, and they're not necessarily set..
so the new unconditional serialization of filterScopes is going to be seen as read as the beforeWU if beforeWU wasn't set and break things.
So this needs more care.
I think:
- change the serialization version # (1 -> 2).
- change deserialization code to only deserialize new info if version >= 2
- change to unconditionally send afterWU/beforeWU (check it's okay that blank, I think it is)
- up SASHAVERSION
This helps but is still flawed, because an a new saserver may send a command with filterScopes to an old one, but if new one unconditional sends afterWU/beforeWU it should be 'ok' as far as that piece of code is concerned.
This is all horrible, it would be far better if it serialized a IPT so didn't have these flaw remaining() branches approach, but would still have the worry re. compat. with older servers.. not worth it at this juncture.
There was a problem hiding this comment.
Ok, I upped the serialization version and SASHAVERSION. I changed it to unconditionally serialize/deserialze the afterWU/beforeWU and filterScopes values in the new version.
There was a problem hiding this comment.
The change doesn't look quite right
It will now never deserialize afterWU and beforeWU if version=1, so a newer command sent with before/after will be ignored.
There should be an else, which falls back to previous logic checking remaining()
| StringBuffer fname; | ||
| offset_t nsz = 0; | ||
| StringArray dirs; | ||
| bool scanFiles = fullScopeMatchesFilter(path.str()+filePathOffset+1); |
There was a problem hiding this comment.
doesn't this approach mean that as it's recursiing down, it's repeatedly, reparsing and rechecking all path segments, when to be here, we know they must have matched? (same true of partialScopeMatchesFilter)
I think if you pre-split the filter scope, then you should only need to compare the current directory path/scope.
There was a problem hiding this comment.
Yes, as it recurses down it repeatedly reparses the path. This is because there are multiple filters supported. If we just check the filter at the current scope, then we could match a filter at that level but not have matched it previously.
There was a problem hiding this comment.
I do think this is a quite a big waste, maybe not that impactful to speed, but the deeper it goes, the more it's wasting time scanning over filters that it should know match already.
But perhaps return to this after this PR to improve.
dali/dfuXRefLib/XRefNodeManager.hpp
Outdated
|
|
||
| virtual StringBuffer& getCluster(StringBuffer& Cluster); | ||
| virtual void setCluster(const char* str); | ||
| virtual StringBuffer& getFilterScopes(StringBuffer& filterScopes); |
There was a problem hiding this comment.
minor: this other IXRefNode impls. should have 'override' really.
There was a problem hiding this comment.
Added override to IXRefNode methods
There was a problem hiding this comment.
missing virtuals to accompany them
| mb.read(beforeWU); | ||
| } | ||
| if (mb.remaining() > 0) | ||
| mb.read(filterScopes); |
There was a problem hiding this comment.
see CSashaCommnd::serialize - afterWU and beforeWU are only serialized if set, and they're not necessarily set..
so the new unconditional serialization of filterScopes is going to be seen as read as the beforeWU if beforeWU wasn't set and break things.
So this needs more care.
I think:
- change the serialization version # (1 -> 2).
- change deserialization code to only deserialize new info if version >= 2
- change to unconditionally send afterWU/beforeWU (check it's okay that blank, I think it is)
- up SASHAVERSION
This helps but is still flawed, because an a new saserver may send a command with filterScopes to an old one, but if new one unconditional sends afterWU/beforeWU it should be 'ok' as far as that piece of code is concerned.
This is all horrible, it would be far better if it serialized a IPT so didn't have these flaw remaining() branches approach, but would still have the worry re. compat. with older servers.. not worth it at this juncture.
dali/sasha/saxref.cpp
Outdated
| { | ||
| if (fileScopeEnd == fileScope) | ||
| { | ||
| for (;i<numItemsi;i++) |
There was a problem hiding this comment.
the functionality of this block is confusing... (use of hidden 'numItemsi' .. is confusing too, thought wouldn't compile at 1st glance).
What is the logic here..
It looks like it's saying, if wound back through lfn scopes and no matches, then match (return false) if any are empty.. but if that was wanted.. then it means all files match if any are blank, and why bother checking any?
and, why are there any empty scopeFilters in the 1st place?
again could do with comments in code to show intent if not obvious.
There was a problem hiding this comment.
The while loop at the beginning removes the filename to get just the scope. I moved this out of the for loop since the logical filename won't change. I simplified the code when no scope separator is found to use forEachItemIn instead of reusing the hidden variable.
The reason I left in blank filters was because I wasn't sure how to allow files found in the root directory that have no scope. Are they always filtered/not filtered or should there be a way to specify?
There was a problem hiding this comment.
The reason I left in blank filters was because I wasn't sure how to allow files found in the root directory that have no scope. Are they always filtered/not filtered or should there be a way to specify?
I'm not sure I would have bothered allowing a filter that included top-scope. I don't imagine it will ever be used.
However, I think if wasn't special cases, it would have been less of a source of confusion.
Really whether it's a scope filter of "x", or "", it is still the same semantics, match the scope of the file, e.g. does "x" match the leading scopes of "x::file", or does "" match the leading scope of file "file2". In both cases they do without special casing.
i.e. I think you can simplify by removing the special case.
The '// Check if scope matches any filter' already copes with a zero length scope, i.e. strncmp of length 0 is always 0, so I think loop could be just:
// Get pointer to end of scope in logical file name
while (fileScopeEnd != fileScope)
{
if (fileScopeEnd[0] == ':' && fileScopeEnd[1] == ':')
break;
fileScopeEnd--;
}
but probably just as efficient if not more (and arguably functionally clearer), to use std::string_view/rfind:
std::string_view sv(fileScope, logicalFileName.length());
size_t pos = sv.rfind("::");
size_t prefixLength = (pos != std::string_view::npos) ? pos : logicalFileName.length();
| mb.read(beforeWU); | ||
| } | ||
| if (mb.remaining() > 0) | ||
| mb.read(filterScopes); |
There was a problem hiding this comment.
The change doesn't look quite right
It will now never deserialize afterWU and beforeWU if version=1, so a newer command sent with before/after will be ignored.
There should be an else, which falls back to previous logic checking remaining()
| dts[i].serialize(mb); | ||
|
|
||
| mb.append(sortDescending); | ||
| if (afterWU.get() || beforeWU.get()) { |
There was a problem hiding this comment.
this is fine, uncondtionally sending, the old logic dealt with that, but see comment re. deserialize side.
dali/dfuXRefLib/XRefNodeManager.hpp
Outdated
| void setXRefData(IPropertyTree & pTree); | ||
|
|
||
| void BuildXRefData(IPropertyTree & pTree,const char* Cluster); | ||
| void BuildXRefData(IPropertyTree & pTree,const char* Cluster) override; |
There was a problem hiding this comment.
I know not new, but as adding 'override' please always also add 'virtual'
dali/dfuXRefLib/XRefNodeManager.hpp
Outdated
|
|
||
| virtual StringBuffer& getCluster(StringBuffer& Cluster); | ||
| virtual void setCluster(const char* str); | ||
| virtual StringBuffer& getFilterScopes(StringBuffer& filterScopes); |
There was a problem hiding this comment.
missing virtuals to accompany them
dali/sasha/saxref.cpp
Outdated
| { | ||
| if (fileScopeEnd == fileScope) | ||
| { | ||
| for (;i<numItemsi;i++) |
There was a problem hiding this comment.
The reason I left in blank filters was because I wasn't sure how to allow files found in the root directory that have no scope. Are they always filtered/not filtered or should there be a way to specify?
I'm not sure I would have bothered allowing a filter that included top-scope. I don't imagine it will ever be used.
However, I think if wasn't special cases, it would have been less of a source of confusion.
Really whether it's a scope filter of "x", or "", it is still the same semantics, match the scope of the file, e.g. does "x" match the leading scopes of "x::file", or does "" match the leading scope of file "file2". In both cases they do without special casing.
i.e. I think you can simplify by removing the special case.
The '// Check if scope matches any filter' already copes with a zero length scope, i.e. strncmp of length 0 is always 0, so I think loop could be just:
// Get pointer to end of scope in logical file name
while (fileScopeEnd != fileScope)
{
if (fileScopeEnd[0] == ':' && fileScopeEnd[1] == ':')
break;
fileScopeEnd--;
}
but probably just as efficient if not more (and arguably functionally clearer), to use std::string_view/rfind:
std::string_view sv(fileScope, logicalFileName.length());
size_t pos = sv.rfind("::");
size_t prefixLength = (pos != std::string_view::npos) ? pos : logicalFileName.length();
dali/sasha/sacmd.cpp
Outdated
| mb.read(beforeWU); | ||
| mb.read(filterScopes); | ||
| } | ||
| else if (version == 1) { |
There was a problem hiding this comment.
conditional on version == 1 here is ok, but it was not conditional prior to this PR at all (only on mb.reamaining()), so strictly speaknig it would be more correct to not make it dependent on version == 1 now.
Admittedly version=1 dates back to ancient history so it is highly unlikely there is a version 0 out there...
dali/sasha/saxref.cpp
Outdated
| { | ||
| unsigned filterLen = strlen(parent.scopeFilters.item(i)); | ||
| if (filterLen != (unsigned)(fileScopeEnd - fileScope)) | ||
| if (filterLen != (unsigned)prefixLength) |
There was a problem hiding this comment.
trivial: size_t is the natural type here. strlen returns a size_t, and strncmp takes a size_t
|
@jakesmith Added fixes from your comments and squashed. |
86cddeb
into
hpcc-systems:candidate-10.2.x
|
Jirabot Action Result: |
Type of change:
Checklist:
Smoketest:
Testing: