Skip to content

Commit de3e9a3

Browse files
authored
[DOCS] Clarify regex character range case insensitivity limitations
Flagged by @dadoonet in Slack: > Hey team. There's an interesting discussion on discuss about Lucene regex support. Our documentation says things like: [a-c] # matches 'a', 'b', or 'c' [^a-c] # matches any character except 'a', 'b', or 'c' Which is correct unless you use as well case_insensitive: true option. In which case you would expect B to match [a-c]. But it does not work that way and it's a known limitation as Robert answered in apache/lucene#14378 (comment). Could we document that somehow?
1 parent 7444595 commit de3e9a3

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

docs/reference/query-dsl/regexp-syntax.asciidoc

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,15 @@ example:
181181
[^-abc] # matches any character except '-', 'a', 'b', or 'c'
182182
[^abc\-] # matches any character except 'a', 'b', 'c', or '-'
183183
....
184+
185+
[NOTE]
186+
====
187+
Character range classes such as `[a-c]` do not behave as expected when using `case_insensitive: true` — they remain case sensitive.
188+
For example, `[a-c]+` with `case_insensitive: true` will match strings containing only the characters 'a', 'b', and 'c', but not 'A', 'B', or 'C'.
189+
Use `[a-zA-Z]` to match both uppercase and lowercase characters.
190+
This is due to a known limitation in Lucene's regular expression engine.
191+
See https://github.com/apache/lucene/issues/14378[Lucene issue #14378] for details.
192+
====
184193
--
185194

186195
[discrete]

0 commit comments

Comments
 (0)