|
4 | 4 | <titleabbrev>Regexp</titleabbrev>
|
5 | 5 | ++++
|
6 | 6 |
|
7 |
| -The `regexp` query allows you to use regular expression term queries. |
8 |
| -See <<regexp-syntax>> for details of the supported regular expression language. |
9 |
| -The "term queries" in that first sentence means that Elasticsearch will apply |
10 |
| -the regexp to the terms produced by the tokenizer for that field, and not |
11 |
| -to the original text of the field. |
| 7 | +Returns documents that contain terms matching a |
| 8 | +https://en.wikipedia.org/wiki/Regular_expression[regular expression]. |
12 | 9 |
|
13 |
| -*Note*: The performance of a `regexp` query heavily depends on the |
14 |
| -regular expression chosen. Matching everything like `.*` is very slow as |
15 |
| -well as using lookaround regular expressions. If possible, you should |
16 |
| -try to use a long prefix before your regular expression starts. Wildcard |
17 |
| -matchers like `.*?+` will mostly lower performance. |
| 10 | +A regular expression is a way to match patterns in data using placeholder |
| 11 | +characters, called operators. For a list of operators supported by the |
| 12 | +`regexp` query, see <<regexp-syntax, Regular expression syntax>>. |
18 | 13 |
|
19 |
| -[source,js] |
20 |
| --------------------------------------------------- |
21 |
| -GET /_search |
22 |
| -{ |
23 |
| - "query": { |
24 |
| - "regexp":{ |
25 |
| - "name.first": "s.*y" |
26 |
| - } |
27 |
| - } |
28 |
| -} |
29 |
| --------------------------------------------------- |
30 |
| -// CONSOLE |
| 14 | +[[regexp-query-ex-request]] |
| 15 | +==== Example request |
31 | 16 |
|
32 |
| -Boosting is also supported |
| 17 | +The following search returns documents where the `user` field contains any term |
| 18 | +that begins with `k` and ends with `y`. The `.*` operators match any |
| 19 | +characters of any length, including no characters. Matching |
| 20 | +terms can include `ky`, `kay`, and `kimchy`. |
33 | 21 |
|
34 | 22 | [source,js]
|
35 |
| --------------------------------------------------- |
| 23 | +---- |
36 | 24 | GET /_search
|
37 | 25 | {
|
38 | 26 | "query": {
|
39 |
| - "regexp":{ |
40 |
| - "name.first":{ |
41 |
| - "value":"s.*y", |
42 |
| - "boost":1.2 |
| 27 | + "regexp": { |
| 28 | + "user": { |
| 29 | + "value": "k.*y", |
| 30 | + "flags" : "ALL", |
| 31 | + "max_determinized_states": 10000, |
| 32 | + "rewrite": "constant_score" |
43 | 33 | }
|
44 | 34 | }
|
45 | 35 | }
|
46 | 36 | }
|
47 |
| --------------------------------------------------- |
| 37 | +---- |
48 | 38 | // CONSOLE
|
49 | 39 |
|
50 |
| -You can also use special flags |
51 | 40 |
|
52 |
| -[source,js] |
53 |
| --------------------------------------------------- |
54 |
| -GET /_search |
55 |
| -{ |
56 |
| - "query": { |
57 |
| - "regexp":{ |
58 |
| - "name.first": { |
59 |
| - "value": "s.*y", |
60 |
| - "flags" : "INTERSECTION|COMPLEMENT|EMPTY" |
61 |
| - } |
62 |
| - } |
63 |
| - } |
64 |
| -} |
65 |
| --------------------------------------------------- |
66 |
| -// CONSOLE |
| 41 | +[[regexp-top-level-params]] |
| 42 | +==== Top-level parameters for `regexp` |
| 43 | +`<field>`:: |
| 44 | +(Required, object) Field you wish to search. |
67 | 45 |
|
68 |
| -Possible flags are `ALL` (default), `ANYSTRING`, `COMPLEMENT`, |
69 |
| -`EMPTY`, `INTERSECTION`, `INTERVAL`, or `NONE`. Please check the |
70 |
| -http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/util/automaton/RegExp.html[Lucene |
71 |
| -documentation] for their meaning |
| 46 | +[[regexp-query-field-params]] |
| 47 | +==== Parameters for `<field>` |
| 48 | +`value`:: |
| 49 | +(Required, string) Regular expression for terms you wish to find in the provided |
| 50 | +`<field>`. For a list of supported operators, see <<regexp-syntax, Regular |
| 51 | +expression syntax>>. |
| 52 | ++ |
| 53 | +-- |
| 54 | +By default, regular expressions are limited to 1,000 characters. You can change |
| 55 | +this limit using the <<index-max-regex-length, `index.max_regex_length`>> |
| 56 | +setting. |
72 | 57 |
|
73 |
| -Regular expressions are dangerous because it's easy to accidentally |
74 |
| -create an innocuous looking one that requires an exponential number of |
75 |
| -internal determinized automaton states (and corresponding RAM and CPU) |
76 |
| -for Lucene to execute. Lucene prevents these using the |
77 |
| -`max_determinized_states` setting (defaults to 10000). You can raise |
78 |
| -this limit to allow more complex regular expressions to execute. |
| 58 | +[WARNING] |
| 59 | +===== |
| 60 | +The performance of the `regexp` query can vary based on the regular expression |
| 61 | +provided. To improve performance, avoid using wildcard patterns, such as `.*` or |
| 62 | +`.*?+`, without a prefix or suffix. |
| 63 | +===== |
| 64 | +-- |
79 | 65 |
|
80 |
| -[source,js] |
81 |
| --------------------------------------------------- |
82 |
| -GET /_search |
83 |
| -{ |
84 |
| - "query": { |
85 |
| - "regexp":{ |
86 |
| - "name.first": { |
87 |
| - "value": "s.*y", |
88 |
| - "flags" : "INTERSECTION|COMPLEMENT|EMPTY", |
89 |
| - "max_determinized_states": 20000 |
90 |
| - } |
91 |
| - } |
92 |
| - } |
93 |
| -} |
94 |
| --------------------------------------------------- |
95 |
| -// CONSOLE |
| 66 | +`flags`:: |
| 67 | +(Optional, string) Enables optional operators for the regular expression. For |
| 68 | +valid values and more information, see <<regexp-optional-operators, Regular |
| 69 | +expression syntax>>. |
| 70 | + |
| 71 | +`max_determinized_states`:: |
| 72 | ++ |
| 73 | +-- |
| 74 | +(Optional, integer) Maximum number of |
| 75 | +https://en.wikipedia.org/wiki/Deterministic_finite_automaton[automaton states] |
| 76 | +required for the query. Default is `10000`. |
| 77 | + |
| 78 | +{es} uses https://lucene.apache.org/core/[Apache Lucene] internally to parse |
| 79 | +regular expressions. Lucene converts each regular expression to a finite |
| 80 | +automaton containing a number of determinized states. |
96 | 81 |
|
97 |
| -NOTE: By default the maximum length of regex string allowed in a Regexp Query |
98 |
| -is limited to 1000. You can update the `index.max_regex_length` index setting |
99 |
| -to bypass this limit. |
| 82 | +You can use this parameter to prevent that conversion from unintentionally |
| 83 | +consuming too many resources. You may need to increase this limit to run complex |
| 84 | +regular expressions. |
| 85 | +-- |
100 | 86 |
|
101 |
| -include::regexp-syntax.asciidoc[] |
| 87 | +`rewrite`:: |
| 88 | +(Optional, string) Method used to rewrite the query. For valid values and more |
| 89 | +information, see the <<query-dsl-multi-term-rewrite, `rewrite` parameter>>. |
0 commit comments