-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add XmlProcessor initial implementation #130337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
🔍 Preview links for changed docs: 🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes. |
5bd50d5 to
95df637
Compare
95df637 to
67dd264
Compare
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
Hi @marc-gr, I've created a changelog YAML for you. |
|
Sorry, github has been being weird at me and sticking some of my comments in pending when I didn't expect it to. I was meaning to reply to the question about the noop processor. The other comments are nits, I'm not sure whether they're still valid on the latest iteration of the code. |
The callers are always either static-ly a String or a List, so let's just have static arities for exactly and only those versions.
They don't make a drug for what's wrong with me.
| this.xpathExpressions = xpathExpressions != null ? Map.copyOf(xpathExpressions) : Map.of(); | ||
| this.namespaces = namespaces != null ? Map.copyOf(namespaces) : Map.of(); | ||
|
|
||
| this.compiledXPathExpressions = compileXPathExpressions(this.xpathExpressions, this.namespaces); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For that matter, The XPathFactory class is not thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://jcp.org/aboutJava/communityprocess/review/jsr063/jaxp-pd1.pdf, though, it seems that the DocumentBuilderFactory and SAXParserFactory classes are threadsafe, so XPathFactory is the odd ball. This is fun, we're having fun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#134923 is tangentially related to this conversation (not the thread safety part, though, which is the primary thrust of this conversation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Use pre-configured secure DOM factory | ||
| // Since we build DOM programmatically (createElementNS/createElement), | ||
| // the factory's namespace awareness doesn't affect our usage | ||
| DocumentBuilder builder = XmlFactories.DOM_FACTORY.newDocumentBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newDocumentBuilder is too expensive to invoke per-document. We're going to have to do the same threadlocal business on that one, too.
The existing implementation always does the work of building the structured result, even if the the structured result is not used for anything. That hurts performance in the xpath-expressions-only case. |
|
I merged |
| return XmlUtils.getHardenedXPath(); | ||
| } catch (Exception e) { | ||
| logger.warn("Cannot configure secure XPath object - XML processor may not work correctly", e); | ||
| return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the existing pattern of returning null here, but honestly I think we should blow up and die much more aggressively if this happens. The processor is just going to throw NPEs at every step of the way anyway, and that seems... unproductive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: The throwing null pattern was introduced at a point when we were eagerly creating things in static initializers. It was my suggested approach to avoid a node which lacked the required functionality failing to start up, before anyone even tried to create an XML processor. If the thread local changes mean we're now creating things lazily and will only throw when you try to use the missing functionality, we might want to revisit the exception handling.
|
More notes from myself to myself: |


This PR creates a new XML processor that achieves feature parity with Logstash's XML filter.
⚙️ Configuration Options
🏗️ Architecture
📚 Documentation
Documentation includes:
Logstash differences
ignore_empty_valuebehaves a bit different thansuppress_empty, but I think it matches better with other processors behavior. It could be adapted, or even add both, but I found it confusing.Closes #97364