Skip to content

Commit 711d259

Browse files
ES|QL: improve docs about escaping for GROK, DISSECT, LIKE, RLIKE (elastic#115320) (elastic#115498)
1 parent 0b703e9 commit 711d259

File tree

7 files changed

+180
-26
lines changed

7 files changed

+180
-26
lines changed

docs/reference/esql/esql-process-data-with-dissect-grok.asciidoc

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ delimiter-based pattern, and extracts the specified keys as columns.
4040
For example, the following pattern:
4141
[source,txt]
4242
----
43-
%{clientip} [%{@timestamp}] %{status}
43+
%{clientip} [%{@timestamp}] %{status}
4444
----
4545

4646
matches a log line of this format:
@@ -76,8 +76,8 @@ ignore certain fields, append fields, skip over padding, etc.
7676
===== Terminology
7777

7878
dissect pattern::
79-
the set of fields and delimiters describing the textual
80-
format. Also known as a dissection.
79+
the set of fields and delimiters describing the textual
80+
format. Also known as a dissection.
8181
The dissection is described using a set of `%{}` sections:
8282
`%{a} - %{b} - %{c}`
8383

@@ -91,14 +91,14 @@ Any set of characters other than `%{`, `'not }'`, or `}` is a delimiter.
9191
key::
9292
+
9393
--
94-
the text between the `%{` and `}`, exclusive of the `?`, `+`, `&` prefixes
95-
and the ordinal suffix.
94+
the text between the `%{` and `}`, exclusive of the `?`, `+`, `&` prefixes
95+
and the ordinal suffix.
9696

9797
Examples:
9898

99-
* `%{?aaa}` - the key is `aaa`
100-
* `%{+bbb/3}` - the key is `bbb`
101-
* `%{&ccc}` - the key is `ccc`
99+
* `%{?aaa}` - the key is `aaa`
100+
* `%{+bbb/3}` - the key is `bbb`
101+
* `%{&ccc}` - the key is `ccc`
102102
--
103103

104104
[[esql-dissect-examples]]
@@ -218,7 +218,7 @@ Putting it together as an {esql} query:
218218

219219
[source.merge.styled,esql]
220220
----
221-
include::{esql-specs}/docs.csv-spec[tag=grokWithEscape]
221+
include::{esql-specs}/docs.csv-spec[tag=grokWithEscapeTripleQuotes]
222222
----
223223

224224
`GROK` adds the following columns to the input table:
@@ -239,15 +239,24 @@ with a `\`. For example, in the earlier pattern:
239239
%{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}
240240
----
241241
242-
In {esql} queries, the backslash character itself is a special character that
242+
In {esql} queries, when using single quotes for strings, the backslash character itself is a special character that
243243
needs to be escaped with another `\`. For this example, the corresponding {esql}
244244
query becomes:
245245
[source.merge.styled,esql]
246246
----
247247
include::{esql-specs}/docs.csv-spec[tag=grokWithEscape]
248248
----
249+
250+
For this reason, in general it is more convenient to use triple quotes `"""` for GROK patterns,
251+
that do not require escaping for backslash.
252+
253+
[source.merge.styled,esql]
254+
----
255+
include::{esql-specs}/docs.csv-spec[tag=grokWithEscapeTripleQuotes]
256+
----
249257
====
250258

259+
251260
[[esql-grok-patterns]]
252261
===== Grok patterns
253262

@@ -318,4 +327,4 @@ as the `GROK` command.
318327
The `GROK` command does not support configuring <<custom-patterns,custom
319328
patterns>>, or <<trace-match,multiple patterns>>. The `GROK` command is not
320329
subject to <<grok-watchdog,Grok watchdog settings>>.
321-
// end::grok-limitations[]
330+
// end::grok-limitations[]

docs/reference/esql/functions/like.asciidoc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,20 @@ include::{esql-specs}/docs.csv-spec[tag=like]
2323
|===
2424
include::{esql-specs}/docs.csv-spec[tag=like-result]
2525
|===
26+
27+
Matching the exact characters `*` and `.` will require escaping.
28+
The escape character is backslash `\`. Since also backslash is a special character in string literals,
29+
it will require further escaping.
30+
31+
[source.merge.styled,esql]
32+
----
33+
include::{esql-specs}/string.csv-spec[tag=likeEscapingSingleQuotes]
34+
----
35+
36+
To reduce the overhead of escaping, we suggest using triple quotes strings `"""`
37+
38+
[source.merge.styled,esql]
39+
----
40+
include::{esql-specs}/string.csv-spec[tag=likeEscapingTripleQuotes]
41+
----
2642
// end::body[]

docs/reference/esql/functions/rlike.asciidoc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,20 @@ include::{esql-specs}/docs.csv-spec[tag=rlike]
1818
|===
1919
include::{esql-specs}/docs.csv-spec[tag=rlike-result]
2020
|===
21+
22+
Matching special characters (eg. `.`, `*`, `(`...) will require escaping.
23+
The escape character is backslash `\`. Since also backslash is a special character in string literals,
24+
it will require further escaping.
25+
26+
[source.merge.styled,esql]
27+
----
28+
include::{esql-specs}/string.csv-spec[tag=rlikeEscapingSingleQuotes]
29+
----
30+
31+
To reduce the overhead of escaping, we suggest using triple quotes strings `"""`
32+
33+
[source.merge.styled,esql]
34+
----
35+
include::{esql-specs}/string.csv-spec[tag=rlikeEscapingTripleQuotes]
36+
----
2137
// end::body[]

x-pack/plugin/esql/qa/testFixtures/src/main/resources/docs.csv-spec

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,7 @@ count:long | languages:integer
382382
basicGrok
383383
// tag::basicGrok[]
384384
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42"
385-
| GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num}"
385+
| GROK a """%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num}"""
386386
| KEEP date, ip, email, num
387387
// end::basicGrok[]
388388
;
@@ -396,7 +396,7 @@ date:keyword | ip:keyword | email:keyword | num:keyword
396396
grokWithConversionSuffix
397397
// tag::grokWithConversionSuffix[]
398398
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42"
399-
| GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"
399+
| GROK a """%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"""
400400
| KEEP date, ip, email, num
401401
// end::grokWithConversionSuffix[]
402402
;
@@ -410,7 +410,7 @@ date:keyword | ip:keyword | email:keyword | num:integer
410410
grokWithToDatetime
411411
// tag::grokWithToDatetime[]
412412
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42"
413-
| GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"
413+
| GROK a """%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"""
414414
| KEEP date, ip, email, num
415415
| EVAL date = TO_DATETIME(date)
416416
// end::grokWithToDatetime[]
@@ -436,11 +436,27 @@ ROW a = "1.2.3.4 [2023-01-23T12:15:00.000Z] Connected"
436436
// end::grokWithEscape-result[]
437437
;
438438

439+
440+
grokWithEscapeTripleQuotes
441+
// tag::grokWithEscapeTripleQuotes[]
442+
ROW a = "1.2.3.4 [2023-01-23T12:15:00.000Z] Connected"
443+
| GROK a """%{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}"""
444+
// end::grokWithEscapeTripleQuotes[]
445+
| KEEP @timestamp
446+
;
447+
448+
// tag::grokWithEscapeTripleQuotes-result[]
449+
@timestamp:keyword
450+
2023-01-23T12:15:00.000Z
451+
// end::grokWithEscapeTripleQuotes-result[]
452+
;
453+
454+
439455
grokWithDuplicateFieldNames
440456
// tag::grokWithDuplicateFieldNames[]
441457
FROM addresses
442458
| KEEP city.name, zip_code
443-
| GROK zip_code "%{WORD:zip_parts} %{WORD:zip_parts}"
459+
| GROK zip_code """%{WORD:zip_parts} %{WORD:zip_parts}"""
444460
// end::grokWithDuplicateFieldNames[]
445461
| SORT city.name
446462
;
@@ -456,7 +472,7 @@ Tokyo | 100-7014 | null
456472
basicDissect
457473
// tag::basicDissect[]
458474
ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1"
459-
| DISSECT a "%{date} - %{msg} - %{ip}"
475+
| DISSECT a """%{date} - %{msg} - %{ip}"""
460476
| KEEP date, msg, ip
461477
// end::basicDissect[]
462478
;
@@ -470,7 +486,7 @@ date:keyword | msg:keyword | ip:keyword
470486
dissectWithToDatetime
471487
// tag::dissectWithToDatetime[]
472488
ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1"
473-
| DISSECT a "%{date} - %{msg} - %{ip}"
489+
| DISSECT a """%{date} - %{msg} - %{ip}"""
474490
| KEEP date, msg, ip
475491
| EVAL date = TO_DATETIME(date)
476492
// end::dissectWithToDatetime[]
@@ -485,7 +501,7 @@ some text | 127.0.0.1 | 2023-01-23T12:15:00.000Z
485501
dissectRightPaddingModifier
486502
// tag::dissectRightPaddingModifier[]
487503
ROW message="1998-08-10T17:15:42 WARN"
488-
| DISSECT message "%{ts->} %{level}"
504+
| DISSECT message """%{ts->} %{level}"""
489505
// end::dissectRightPaddingModifier[]
490506
;
491507

@@ -498,7 +514,7 @@ message:keyword | ts:keyword | level:keyword
498514
dissectEmptyRightPaddingModifier#[skip:-8.11.2, reason:Support for empty right padding modifiers introduced in 8.11.2]
499515
// tag::dissectEmptyRightPaddingModifier[]
500516
ROW message="[1998-08-10T17:15:42] [WARN]"
501-
| DISSECT message "[%{ts}]%{->}[%{level}]"
517+
| DISSECT message """[%{ts}]%{->}[%{level}]"""
502518
// end::dissectEmptyRightPaddingModifier[]
503519
;
504520

@@ -511,7 +527,7 @@ ROW message="[1998-08-10T17:15:42] [WARN]"
511527
dissectAppendModifier
512528
// tag::dissectAppendModifier[]
513529
ROW message="john jacob jingleheimer schmidt"
514-
| DISSECT message "%{+name} %{+name} %{+name} %{+name}" APPEND_SEPARATOR=" "
530+
| DISSECT message """%{+name} %{+name} %{+name} %{+name}""" APPEND_SEPARATOR=" "
515531
// end::dissectAppendModifier[]
516532
;
517533

@@ -524,7 +540,7 @@ john jacob jingleheimer schmidt|john jacob jingleheimer schmidt
524540
dissectAppendWithOrderModifier
525541
// tag::dissectAppendWithOrderModifier[]
526542
ROW message="john jacob jingleheimer schmidt"
527-
| DISSECT message "%{+name/2} %{+name/4} %{+name/3} %{+name/1}" APPEND_SEPARATOR=","
543+
| DISSECT message """%{+name/2} %{+name/4} %{+name/3} %{+name/1}""" APPEND_SEPARATOR=","
528544
// end::dissectAppendWithOrderModifier[]
529545
;
530546

@@ -537,7 +553,7 @@ john jacob jingleheimer schmidt|schmidt,john,jingleheimer,jacob
537553
dissectNamedSkipKey
538554
// tag::dissectNamedSkipKey[]
539555
ROW message="1.2.3.4 - - 30/Apr/1998:22:00:52 +0000"
540-
| DISSECT message "%{clientip} %{?ident} %{?auth} %{@timestamp}"
556+
| DISSECT message """%{clientip} %{?ident} %{?auth} %{@timestamp}"""
541557
// end::dissectNamedSkipKey[]
542558
;
543559

@@ -550,7 +566,7 @@ message:keyword | clientip:keyword | @timestamp:keyword
550566
docsLike
551567
// tag::like[]
552568
FROM employees
553-
| WHERE first_name LIKE "?b*"
569+
| WHERE first_name LIKE """?b*"""
554570
| KEEP first_name, last_name
555571
// end::like[]
556572
| SORT first_name
@@ -566,7 +582,7 @@ Eberhardt |Terkki
566582
docsRlike
567583
// tag::rlike[]
568584
FROM employees
569-
| WHERE first_name RLIKE ".leja.*"
585+
| WHERE first_name RLIKE """.leja.*"""
570586
| KEEP first_name, last_name
571587
// end::rlike[]
572588
;

x-pack/plugin/esql/qa/testFixtures/src/main/resources/string.csv-spec

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1595,4 +1595,57 @@ emp_no:integer | languages:integer | first_name:keyword
15951595
10004 | 5 | ChirstianChirstianChirstianChirstianChirstian
15961596
;
15971597

1598+
likeEscapingSingleQuotes
1599+
// tag::likeEscapingSingleQuotes[]
1600+
ROW message = "foo * bar"
1601+
| WHERE message LIKE "foo \\* bar"
1602+
// end::likeEscapingSingleQuotes[]
1603+
;
1604+
1605+
// tag::likeEscapingSingleQuotes-result[]
1606+
message:keyword
1607+
foo * bar
1608+
// end::likeEscapingSingleQuotes-result[]
1609+
;
1610+
1611+
1612+
likeEscapingTripleQuotes
1613+
// tag::likeEscapingTripleQuotes[]
1614+
ROW message = "foo * bar"
1615+
| WHERE message LIKE """foo \* bar"""
1616+
// end::likeEscapingTripleQuotes[]
1617+
;
1618+
1619+
// tag::likeEscapingTripleQuotes-result[]
1620+
message:keyword
1621+
foo * bar
1622+
// end::likeEscapingTripleQuotes-result[]
1623+
;
1624+
15981625

1626+
rlikeEscapingSingleQuotes
1627+
// tag::rlikeEscapingSingleQuotes[]
1628+
ROW message = "foo ( bar"
1629+
| WHERE message RLIKE "foo \\( bar"
1630+
// end::rlikeEscapingSingleQuotes[]
1631+
;
1632+
1633+
// tag::rlikeEscapingSingleQuotes-result[]
1634+
message:keyword
1635+
foo ( bar
1636+
// end::rlikeEscapingSingleQuotes-result[]
1637+
;
1638+
1639+
1640+
rlikeEscapingTripleQuotes
1641+
// tag::rlikeEscapingTripleQuotes[]
1642+
ROW message = "foo ( bar"
1643+
| WHERE message RLIKE """foo \( bar"""
1644+
// end::rlikeEscapingTripleQuotes[]
1645+
;
1646+
1647+
// tag::rlikeEscapingTripleQuotes-result[]
1648+
message:keyword
1649+
foo ( bar
1650+
// end::rlikeEscapingTripleQuotes-result[]
1651+
;

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/RLike.java

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@
1616
import org.elasticsearch.xpack.esql.core.tree.NodeInfo;
1717
import org.elasticsearch.xpack.esql.core.tree.Source;
1818
import org.elasticsearch.xpack.esql.evaluator.mapper.EvaluatorMapper;
19+
import org.elasticsearch.xpack.esql.expression.function.Example;
20+
import org.elasticsearch.xpack.esql.expression.function.FunctionInfo;
21+
import org.elasticsearch.xpack.esql.expression.function.Param;
1922
import org.elasticsearch.xpack.esql.io.stream.PlanStreamInput;
2023

2124
import java.io.IOException;
@@ -27,7 +30,32 @@
2730
public class RLike extends org.elasticsearch.xpack.esql.core.expression.predicate.regex.RLike implements EvaluatorMapper {
2831
public static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(Expression.class, "RLike", RLike::new);
2932

30-
public RLike(Source source, Expression value, RLikePattern pattern) {
33+
@FunctionInfo(returnType = "boolean", description = """
34+
Use `RLIKE` to filter data based on string patterns using using
35+
<<regexp-syntax,regular expressions>>. `RLIKE` usually acts on a field placed on
36+
the left-hand side of the operator, but it can also act on a constant (literal)
37+
expression. The right-hand side of the operator represents the pattern.""", detailedDescription = """
38+
Matching special characters (eg. `.`, `*`, `(`...) will require escaping.
39+
The escape character is backslash `\\`. Since also backslash is a special character in string literals,
40+
it will require further escaping.
41+
42+
[source.merge.styled,esql]
43+
----
44+
include::{esql-specs}/string.csv-spec[tag=rlikeEscapingSingleQuotes]
45+
----
46+
47+
To reduce the overhead of escaping, we suggest using triple quotes strings `\"\"\"`
48+
49+
[source.merge.styled,esql]
50+
----
51+
include::{esql-specs}/string.csv-spec[tag=rlikeEscapingTripleQuotes]
52+
----
53+
""", examples = @Example(file = "docs", tag = "rlike"))
54+
public RLike(
55+
Source source,
56+
@Param(name = "str", type = { "keyword", "text" }, description = "A literal value.") Expression value,
57+
@Param(name = "pattern", type = { "keyword", "text" }, description = "A regular expression.") RLikePattern pattern
58+
) {
3159
super(source, value, pattern);
3260
}
3361

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/WildcardLike.java

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,23 @@ also act on a constant (literal) expression. The right-hand side of the operator
4444
The following wildcard characters are supported:
4545
4646
* `*` matches zero or more characters.
47-
* `?` matches one character.""", examples = @Example(file = "docs", tag = "like"))
47+
* `?` matches one character.""", detailedDescription = """
48+
Matching the exact characters `*` and `.` will require escaping.
49+
The escape character is backslash `\\`. Since also backslash is a special character in string literals,
50+
it will require further escaping.
51+
52+
[source.merge.styled,esql]
53+
----
54+
include::{esql-specs}/string.csv-spec[tag=likeEscapingSingleQuotes]
55+
----
56+
57+
To reduce the overhead of escaping, we suggest using triple quotes strings `\"\"\"`
58+
59+
[source.merge.styled,esql]
60+
----
61+
include::{esql-specs}/string.csv-spec[tag=likeEscapingTripleQuotes]
62+
----
63+
""", examples = @Example(file = "docs", tag = "like"))
4864
public WildcardLike(
4965
Source source,
5066
@Param(name = "str", type = { "keyword", "text" }) Expression left,

0 commit comments

Comments
 (0)