|
4 | 4 | <titleabbrev>Regexp</titleabbrev> |
5 | 5 | ++++ |
6 | 6 |
|
7 | | -The `regexp` query allows you to use regular expression term queries. |
8 | | -See <<regexp-syntax>> for details of the supported regular expression language. |
9 | | -The "term queries" in that first sentence means that Elasticsearch will apply |
10 | | -the regexp to the terms produced by the tokenizer for that field, and not |
11 | | -to the original text of the field. |
| 7 | +Returns documents that contain terms matching a |
| 8 | +https://en.wikipedia.org/wiki/Regular_expression[regular expression]. |
12 | 9 |
|
13 | | -*Note*: The performance of a `regexp` query heavily depends on the |
14 | | -regular expression chosen. Matching everything like `.*` is very slow as |
15 | | -well as using lookaround regular expressions. If possible, you should |
16 | | -try to use a long prefix before your regular expression starts. Wildcard |
17 | | -matchers like `.*?+` will mostly lower performance. |
| 10 | +A regular expression is a way to match patterns in data using placeholder |
| 11 | +characters, called operators. For a list of operators supported by the |
| 12 | +`regexp` query, see <<regexp-syntax, Regular expression syntax>>. |
18 | 13 |
|
19 | | -[source,js] |
20 | | --------------------------------------------------- |
21 | | -GET /_search |
22 | | -{ |
23 | | - "query": { |
24 | | - "regexp":{ |
25 | | - "name.first": "s.*y" |
26 | | - } |
27 | | - } |
28 | | -} |
29 | | --------------------------------------------------- |
30 | | -// CONSOLE |
| 14 | +[[regexp-query-ex-request]] |
| 15 | +==== Example request |
31 | 16 |
|
32 | | -Boosting is also supported |
| 17 | +The following search returns documents where the `user` field contains any term |
| 18 | +that begins with `k` and ends with `y`. The `.*` operators match any |
| 19 | +characters of any length, including no characters. Matching |
| 20 | +terms can include `ky`, `kay`, and `kimchy`. |
33 | 21 |
|
34 | 22 | [source,js] |
35 | | --------------------------------------------------- |
| 23 | +---- |
36 | 24 | GET /_search |
37 | 25 | { |
38 | 26 | "query": { |
39 | | - "regexp":{ |
40 | | - "name.first":{ |
41 | | - "value":"s.*y", |
42 | | - "boost":1.2 |
| 27 | + "regexp": { |
| 28 | + "user": { |
| 29 | + "value": "k.*y", |
| 30 | + "flags" : "ALL", |
| 31 | + "max_determinized_states": 10000, |
| 32 | + "rewrite": "constant_score" |
43 | 33 | } |
44 | 34 | } |
45 | 35 | } |
46 | 36 | } |
47 | | --------------------------------------------------- |
| 37 | +---- |
48 | 38 | // CONSOLE |
49 | 39 |
|
50 | | -You can also use special flags |
51 | 40 |
|
52 | | -[source,js] |
53 | | --------------------------------------------------- |
54 | | -GET /_search |
55 | | -{ |
56 | | - "query": { |
57 | | - "regexp":{ |
58 | | - "name.first": { |
59 | | - "value": "s.*y", |
60 | | - "flags" : "INTERSECTION|COMPLEMENT|EMPTY" |
61 | | - } |
62 | | - } |
63 | | - } |
64 | | -} |
65 | | --------------------------------------------------- |
66 | | -// CONSOLE |
| 41 | +[[regexp-top-level-params]] |
| 42 | +==== Top-level parameters for `regexp` |
| 43 | +`<field>`:: |
| 44 | +(Required, object) Field you wish to search. |
67 | 45 |
|
68 | | -Possible flags are `ALL` (default), `ANYSTRING`, `COMPLEMENT`, |
69 | | -`EMPTY`, `INTERSECTION`, `INTERVAL`, or `NONE`. Please check the |
70 | | -http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/util/automaton/RegExp.html[Lucene |
71 | | -documentation] for their meaning |
| 46 | +[[regexp-query-field-params]] |
| 47 | +==== Parameters for `<field>` |
| 48 | +`value`:: |
| 49 | +(Required, string) Regular expression for terms you wish to find in the provided |
| 50 | +`<field>`. For a list of supported operators, see <<regexp-syntax, Regular |
| 51 | +expression syntax>>. |
| 52 | ++ |
| 53 | +-- |
| 54 | +By default, regular expressions are limited to 1,000 characters. You can change |
| 55 | +this limit using the <<index-max-regex-length, `index.max_regex_length`>> |
| 56 | +setting. |
72 | 57 |
|
73 | | -Regular expressions are dangerous because it's easy to accidentally |
74 | | -create an innocuous looking one that requires an exponential number of |
75 | | -internal determinized automaton states (and corresponding RAM and CPU) |
76 | | -for Lucene to execute. Lucene prevents these using the |
77 | | -`max_determinized_states` setting (defaults to 10000). You can raise |
78 | | -this limit to allow more complex regular expressions to execute. |
| 58 | +[WARNING] |
| 59 | +===== |
| 60 | +The performance of the `regexp` query can vary based on the regular expression |
| 61 | +provided. To improve performance, avoid using wildcard patterns, such as `.*` or |
| 62 | +`.*?+`, without a prefix or suffix. |
| 63 | +===== |
| 64 | +-- |
79 | 65 |
|
80 | | -[source,js] |
81 | | --------------------------------------------------- |
82 | | -GET /_search |
83 | | -{ |
84 | | - "query": { |
85 | | - "regexp":{ |
86 | | - "name.first": { |
87 | | - "value": "s.*y", |
88 | | - "flags" : "INTERSECTION|COMPLEMENT|EMPTY", |
89 | | - "max_determinized_states": 20000 |
90 | | - } |
91 | | - } |
92 | | - } |
93 | | -} |
94 | | --------------------------------------------------- |
95 | | -// CONSOLE |
| 66 | +`flags`:: |
| 67 | +(Optional, string) Enables optional operators for the regular expression. For |
| 68 | +valid values and more information, see <<regexp-optional-operators, Regular |
| 69 | +expression syntax>>. |
| 70 | + |
| 71 | +`max_determinized_states`:: |
| 72 | ++ |
| 73 | +-- |
| 74 | +(Optional, integer) Maximum number of |
| 75 | +https://en.wikipedia.org/wiki/Deterministic_finite_automaton[automaton states] |
| 76 | +required for the query. Default is `10000`. |
| 77 | + |
| 78 | +{es} uses https://lucene.apache.org/core/[Apache Lucene] internally to parse |
| 79 | +regular expressions. Lucene converts each regular expression to a finite |
| 80 | +automaton containing a number of determinized states. |
96 | 81 |
|
97 | | -NOTE: By default the maximum length of regex string allowed in a Regexp Query |
98 | | -is limited to 1000. You can update the `index.max_regex_length` index setting |
99 | | -to bypass this limit. |
| 82 | +You can use this parameter to prevent that conversion from unintentionally |
| 83 | +consuming too many resources. You may need to increase this limit to run complex |
| 84 | +regular expressions. |
| 85 | +-- |
100 | 86 |
|
101 | | -include::regexp-syntax.asciidoc[] |
| 87 | +`rewrite`:: |
| 88 | +(Optional, string) Method used to rewrite the query. For valid values and more |
| 89 | +information, see the <<query-dsl-multi-term-rewrite, `rewrite` parameter>>. |
0 commit comments