fix(elasticsearch): escape Lucene special chars in values only, not field patterns (APIM-13030)#15831
fix(elasticsearch): escape Lucene special chars in values only, not field patterns (APIM-13030)#15831
Conversation
…ield patterns (APIM-13030)
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical regression in Elasticsearch query handling that previously caused 'Search by payload' functionality to fail. The issue stemmed from an overly broad application of Lucene special character escaping, which incorrectly modified field patterns. The fix introduces a more precise escaping strategy, ensuring that only the values within a field:value query clause are escaped, while preserving the integrity of field names and patterns. This restores correct search functionality and improves the robustness of Elasticsearch queries. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the Lucene query string escaping mechanism in ElasticLogRepository to prevent incorrect query parsing. The changes introduce a more granular escaping process that specifically targets the value portions of field:value clauses, preserving field names and patterns. New helper methods like escapeFilterValues, escapeClauseValues, escapeSingleClauseValue, and escapeValueForLuceneJson were added to achieve this, along with expanded unit tests. However, there are still several issues identified: the getQuery method's filter parsing is fragile and can break if a filter value contains a colon; the escapeValueForLuceneJson method only escapes a subset of Lucene's special characters, potentially leading to incorrect query parsing; and the new unit tests are currently validating dead code paths, requiring refactoring to test the live getQuery logic for proper validation and regression prevention.
| .map(filter -> { | ||
| final String filterKey = filter[0]; | ||
| final String escapedValue = escapeValueForLuceneJson(filter[1]); | ||
| if ("body".equals(filterKey)) { | ||
| return "\\\\*.body" + ":" + filter[1]; | ||
| return "\\\\*.body" + ":" + escapedValue; | ||
| } else { | ||
| return filterKey + ":" + filter[1]; | ||
| return filterKey + ":" + escapedValue; | ||
| } |
There was a problem hiding this comment.
This block assumes filter is a two-element array derived from split(":") in the preceding stream operation. This is fragile and will break if a filter value contains a colon (e.g., a URL), causing parts of the value to be lost. A more robust approach is to use indexOf(':') to find the first colon and split the field from the value. A good example of this is in the escapeSingleClauseValue method added in this same PR, although it is currently part of an unused code path.
| private static String escapeValueForLuceneJson(String value) { | ||
| return value.replace("\\", "\\\\\\\\").replace("/", "\\\\/").replace("(", "\\\\(").replace(")", "\\\\)"); | ||
| } |
There was a problem hiding this comment.
This method only escapes a small subset of Lucene's special characters (\, /, (, )). The full list of special characters is + - && || ! ( ) { } [ ] ^ \" ~ * ? : \ /. Omitting characters like +, -, *, and ? can lead to incorrect query parsing and unexpected search results. For example, a search for a value containing api-key could be misinterpreted. Please consider expanding this method to cover all special characters. Apache Lucene's QueryParser.escape() method provides a reference for a complete implementation.
| private String invokeEscapeFilterValues(String filter) { | ||
| try { | ||
| Method method = ElasticLogRepository.class.getDeclaredMethod("escapeFilterValues", String.class); | ||
| method.setAccessible(true); | ||
| return (String) method.invoke(new ElasticLogRepository(), filter); | ||
| } catch (Exception e) { | ||
| throw new RuntimeException(e); | ||
| } | ||
| } |
There was a problem hiding this comment.
This test helper invokes escapeFilterValues, but that method appears to be dead code in ElasticLogRepository. Consequently, the tests using this helper (e.g., compareImplementationWithStandardParser, escapeFilterValues_*) are not validating the production code path, which now runs through the getQuery method. The tests should be refactored to validate the behavior of the live getQuery logic to ensure it is correctly implemented and to prevent future regressions.
| private static String escapeClauseValues(String clause) { | ||
| // Handle OR-separated sub-clauses (e.g. "_id:x OR _id:y") | ||
| if (clause.contains(" OR ")) { | ||
| return stream(clause.split(" OR ")).map(ElasticLogRepository::escapeSingleClauseValue).collect(joining(" OR ")); | ||
| } | ||
| return escapeSingleClauseValue(clause); |
There was a problem hiding this comment.
I think we can add a test for this to verify field names survive because of explicit "OR" handling
mukul-tyagi08
left a comment
There was a problem hiding this comment.
A small comment for test case and also the commit has JIRA reference, not sure if that's okay.
Summary
createSafeElasticsearchJsonQuery()over-escapes backslashes in the\\*.bodywildcard field prefix, making ALL body/payload searches fail silentlygetQuery()where field:value is already parsed — escape only values, preserve\\*.bodyfield patternTest plan
test-user) returns correct results@(user@example.com) returns correct results$($99) returns correct resultsstatus:200) still workstatus:200 AND body:term) still workLuceneEscapeTest)🔗 APIM-13030
🤖 Generated with Claude Code