Skip to content

Commit fe6cb20

Browse files
committed
[DOCS] Rewrite regexp query (#42711)
1 parent c1a7d12 commit fe6cb20

File tree

6 files changed

+210
-281
lines changed

6 files changed

+210
-281
lines changed

docs/reference/index-modules.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,7 @@ specific index module:
205205
The maximum number of terms that can be used in Terms Query.
206206
Defaults to `65536`.
207207

208+
[[index-max-regex-length]]
208209
`index.max_regex_length`::
209210

210211
The maximum length of regex that can be used in Regexp Query.

docs/reference/query-dsl.asciidoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,6 @@ include::query-dsl/term-level-queries.asciidoc[]
4747

4848
include::query-dsl/minimum-should-match.asciidoc[]
4949

50-
include::query-dsl/multi-term-rewrite.asciidoc[]
50+
include::query-dsl/multi-term-rewrite.asciidoc[]
51+
52+
include::query-dsl/regexp-syntax.asciidoc[]

docs/reference/query-dsl/regexp-query.asciidoc

Lines changed: 63 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -4,98 +4,86 @@
44
<titleabbrev>Regexp</titleabbrev>
55
++++
66

7-
The `regexp` query allows you to use regular expression term queries.
8-
See <<regexp-syntax>> for details of the supported regular expression language.
9-
The "term queries" in that first sentence means that Elasticsearch will apply
10-
the regexp to the terms produced by the tokenizer for that field, and not
11-
to the original text of the field.
7+
Returns documents that contain terms matching a
8+
https://en.wikipedia.org/wiki/Regular_expression[regular expression].
129

13-
*Note*: The performance of a `regexp` query heavily depends on the
14-
regular expression chosen. Matching everything like `.*` is very slow as
15-
well as using lookaround regular expressions. If possible, you should
16-
try to use a long prefix before your regular expression starts. Wildcard
17-
matchers like `.*?+` will mostly lower performance.
10+
A regular expression is a way to match patterns in data using placeholder
11+
characters, called operators. For a list of operators supported by the
12+
`regexp` query, see <<regexp-syntax, Regular expression syntax>>.
1813

19-
[source,js]
20-
--------------------------------------------------
21-
GET /_search
22-
{
23-
"query": {
24-
"regexp":{
25-
"name.first": "s.*y"
26-
}
27-
}
28-
}
29-
--------------------------------------------------
30-
// CONSOLE
14+
[[regexp-query-ex-request]]
15+
==== Example request
3116

32-
Boosting is also supported
17+
The following search returns documents where the `user` field contains any term
18+
that begins with `k` and ends with `y`. The `.*` operators match any
19+
characters of any length, including no characters. Matching
20+
terms can include `ky`, `kay`, and `kimchy`.
3321

3422
[source,js]
35-
--------------------------------------------------
23+
----
3624
GET /_search
3725
{
3826
"query": {
39-
"regexp":{
40-
"name.first":{
41-
"value":"s.*y",
42-
"boost":1.2
27+
"regexp": {
28+
"user": {
29+
"value": "k.*y",
30+
"flags" : "ALL",
31+
"max_determinized_states": 10000,
32+
"rewrite": "constant_score"
4333
}
4434
}
4535
}
4636
}
47-
--------------------------------------------------
37+
----
4838
// CONSOLE
4939

50-
You can also use special flags
5140

52-
[source,js]
53-
--------------------------------------------------
54-
GET /_search
55-
{
56-
"query": {
57-
"regexp":{
58-
"name.first": {
59-
"value": "s.*y",
60-
"flags" : "INTERSECTION|COMPLEMENT|EMPTY"
61-
}
62-
}
63-
}
64-
}
65-
--------------------------------------------------
66-
// CONSOLE
41+
[[regexp-top-level-params]]
42+
==== Top-level parameters for `regexp`
43+
`<field>`::
44+
(Required, object) Field you wish to search.
6745

68-
Possible flags are `ALL` (default), `ANYSTRING`, `COMPLEMENT`,
69-
`EMPTY`, `INTERSECTION`, `INTERVAL`, or `NONE`. Please check the
70-
http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/util/automaton/RegExp.html[Lucene
71-
documentation] for their meaning
46+
[[regexp-query-field-params]]
47+
==== Parameters for `<field>`
48+
`value`::
49+
(Required, string) Regular expression for terms you wish to find in the provided
50+
`<field>`. For a list of supported operators, see <<regexp-syntax, Regular
51+
expression syntax>>.
52+
+
53+
--
54+
By default, regular expressions are limited to 1,000 characters. You can change
55+
this limit using the <<index-max-regex-length, `index.max_regex_length`>>
56+
setting.
7257

73-
Regular expressions are dangerous because it's easy to accidentally
74-
create an innocuous looking one that requires an exponential number of
75-
internal determinized automaton states (and corresponding RAM and CPU)
76-
for Lucene to execute. Lucene prevents these using the
77-
`max_determinized_states` setting (defaults to 10000). You can raise
78-
this limit to allow more complex regular expressions to execute.
58+
[WARNING]
59+
=====
60+
The performance of the `regexp` query can vary based on the regular expression
61+
provided. To improve performance, avoid using wildcard patterns, such as `.*` or
62+
`.*?+`, without a prefix or suffix.
63+
=====
64+
--
7965

80-
[source,js]
81-
--------------------------------------------------
82-
GET /_search
83-
{
84-
"query": {
85-
"regexp":{
86-
"name.first": {
87-
"value": "s.*y",
88-
"flags" : "INTERSECTION|COMPLEMENT|EMPTY",
89-
"max_determinized_states": 20000
90-
}
91-
}
92-
}
93-
}
94-
--------------------------------------------------
95-
// CONSOLE
66+
`flags`::
67+
(Optional, string) Enables optional operators for the regular expression. For
68+
valid values and more information, see <<regexp-optional-operators, Regular
69+
expression syntax>>.
70+
71+
`max_determinized_states`::
72+
+
73+
--
74+
(Optional, integer) Maximum number of
75+
https://en.wikipedia.org/wiki/Deterministic_finite_automaton[automaton states]
76+
required for the query. Default is `10000`.
77+
78+
{es} uses https://lucene.apache.org/core/[Apache Lucene] internally to parse
79+
regular expressions. Lucene converts each regular expression to a finite
80+
automaton containing a number of determinized states.
9681

97-
NOTE: By default the maximum length of regex string allowed in a Regexp Query
98-
is limited to 1000. You can update the `index.max_regex_length` index setting
99-
to bypass this limit.
82+
You can use this parameter to prevent that conversion from unintentionally
83+
consuming too many resources. You may need to increase this limit to run complex
84+
regular expressions.
85+
--
10086

101-
include::regexp-syntax.asciidoc[]
87+
`rewrite`::
88+
(Optional, string) Method used to rewrite the query. For valid values and more
89+
information, see the <<query-dsl-multi-term-rewrite, `rewrite` parameter>>.

0 commit comments

Comments
 (0)