Skip to content

Commit 3c47df6

Browse files
ES|QL: improve docs about escaping for GROK, DISSECT, LIKE, RLIKE (#115320) (#115492)
1 parent 6c3168a commit 3c47df6

File tree

11 files changed

+175
-30
lines changed

11 files changed

+175
-30
lines changed

docs/reference/esql/esql-process-data-with-dissect-grok.asciidoc

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ delimiter-based pattern, and extracts the specified keys as columns.
4040
For example, the following pattern:
4141
[source,txt]
4242
----
43-
%{clientip} [%{@timestamp}] %{status}
43+
%{clientip} [%{@timestamp}] %{status}
4444
----
4545

4646
matches a log line of this format:
@@ -76,8 +76,8 @@ ignore certain fields, append fields, skip over padding, etc.
7676
===== Terminology
7777

7878
dissect pattern::
79-
the set of fields and delimiters describing the textual
80-
format. Also known as a dissection.
79+
the set of fields and delimiters describing the textual
80+
format. Also known as a dissection.
8181
The dissection is described using a set of `%{}` sections:
8282
`%{a} - %{b} - %{c}`
8383

@@ -91,14 +91,14 @@ Any set of characters other than `%{`, `'not }'`, or `}` is a delimiter.
9191
key::
9292
+
9393
--
94-
the text between the `%{` and `}`, exclusive of the `?`, `+`, `&` prefixes
95-
and the ordinal suffix.
94+
the text between the `%{` and `}`, exclusive of the `?`, `+`, `&` prefixes
95+
and the ordinal suffix.
9696

9797
Examples:
9898

99-
* `%{?aaa}` - the key is `aaa`
100-
* `%{+bbb/3}` - the key is `bbb`
101-
* `%{&ccc}` - the key is `ccc`
99+
* `%{?aaa}` - the key is `aaa`
100+
* `%{+bbb/3}` - the key is `bbb`
101+
* `%{&ccc}` - the key is `ccc`
102102
--
103103

104104
[[esql-dissect-examples]]
@@ -218,7 +218,7 @@ Putting it together as an {esql} query:
218218

219219
[source.merge.styled,esql]
220220
----
221-
include::{esql-specs}/docs.csv-spec[tag=grokWithEscape]
221+
include::{esql-specs}/docs.csv-spec[tag=grokWithEscapeTripleQuotes]
222222
----
223223

224224
`GROK` adds the following columns to the input table:
@@ -239,15 +239,24 @@ with a `\`. For example, in the earlier pattern:
239239
%{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}
240240
----
241241
242-
In {esql} queries, the backslash character itself is a special character that
242+
In {esql} queries, when using single quotes for strings, the backslash character itself is a special character that
243243
needs to be escaped with another `\`. For this example, the corresponding {esql}
244244
query becomes:
245245
[source.merge.styled,esql]
246246
----
247247
include::{esql-specs}/docs.csv-spec[tag=grokWithEscape]
248248
----
249+
250+
For this reason, in general it is more convenient to use triple quotes `"""` for GROK patterns,
251+
that do not require escaping for backslash.
252+
253+
[source.merge.styled,esql]
254+
----
255+
include::{esql-specs}/docs.csv-spec[tag=grokWithEscapeTripleQuotes]
256+
----
249257
====
250258

259+
251260
[[esql-grok-patterns]]
252261
===== Grok patterns
253262

@@ -318,4 +327,4 @@ as the `GROK` command.
318327
The `GROK` command does not support configuring <<custom-patterns,custom
319328
patterns>>, or <<trace-match,multiple patterns>>. The `GROK` command is not
320329
subject to <<grok-watchdog,Grok watchdog settings>>.
321-
// end::grok-limitations[]
330+
// end::grok-limitations[]

docs/reference/esql/functions/kibana/definition/like.json

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/reference/esql/functions/kibana/definition/rlike.json

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/reference/esql/functions/kibana/docs/like.md

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/reference/esql/functions/kibana/docs/rlike.md

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/reference/esql/functions/like.asciidoc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,20 @@ include::{esql-specs}/docs.csv-spec[tag=like]
2323
|===
2424
include::{esql-specs}/docs.csv-spec[tag=like-result]
2525
|===
26+
27+
Matching the exact characters `*` and `.` will require escaping.
28+
The escape character is backslash `\`. Since also backslash is a special character in string literals,
29+
it will require further escaping.
30+
31+
[source.merge.styled,esql]
32+
----
33+
include::{esql-specs}/string.csv-spec[tag=likeEscapingSingleQuotes]
34+
----
35+
36+
To reduce the overhead of escaping, we suggest using triple quotes strings `"""`
37+
38+
[source.merge.styled,esql]
39+
----
40+
include::{esql-specs}/string.csv-spec[tag=likeEscapingTripleQuotes]
41+
----
2642
// end::body[]

docs/reference/esql/functions/rlike.asciidoc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,20 @@ include::{esql-specs}/docs.csv-spec[tag=rlike]
1818
|===
1919
include::{esql-specs}/docs.csv-spec[tag=rlike-result]
2020
|===
21+
22+
Matching special characters (eg. `.`, `*`, `(`...) will require escaping.
23+
The escape character is backslash `\`. Since also backslash is a special character in string literals,
24+
it will require further escaping.
25+
26+
[source.merge.styled,esql]
27+
----
28+
include::{esql-specs}/string.csv-spec[tag=rlikeEscapingSingleQuotes]
29+
----
30+
31+
To reduce the overhead of escaping, we suggest using triple quotes strings `"""`
32+
33+
[source.merge.styled,esql]
34+
----
35+
include::{esql-specs}/string.csv-spec[tag=rlikeEscapingTripleQuotes]
36+
----
2137
// end::body[]

x-pack/plugin/esql/qa/testFixtures/src/main/resources/docs.csv-spec

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,7 @@ count:long | languages:integer
382382
basicGrok
383383
// tag::basicGrok[]
384384
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42"
385-
| GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num}"
385+
| GROK a """%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num}"""
386386
| KEEP date, ip, email, num
387387
// end::basicGrok[]
388388
;
@@ -396,7 +396,7 @@ date:keyword | ip:keyword | email:keyword | num:keyword
396396
grokWithConversionSuffix
397397
// tag::grokWithConversionSuffix[]
398398
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42"
399-
| GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"
399+
| GROK a """%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"""
400400
| KEEP date, ip, email, num
401401
// end::grokWithConversionSuffix[]
402402
;
@@ -410,7 +410,7 @@ date:keyword | ip:keyword | email:keyword | num:integer
410410
grokWithToDatetime
411411
// tag::grokWithToDatetime[]
412412
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42"
413-
| GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"
413+
| GROK a """%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"""
414414
| KEEP date, ip, email, num
415415
| EVAL date = TO_DATETIME(date)
416416
// end::grokWithToDatetime[]
@@ -436,11 +436,27 @@ ROW a = "1.2.3.4 [2023-01-23T12:15:00.000Z] Connected"
436436
// end::grokWithEscape-result[]
437437
;
438438

439+
440+
grokWithEscapeTripleQuotes
441+
// tag::grokWithEscapeTripleQuotes[]
442+
ROW a = "1.2.3.4 [2023-01-23T12:15:00.000Z] Connected"
443+
| GROK a """%{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}"""
444+
// end::grokWithEscapeTripleQuotes[]
445+
| KEEP @timestamp
446+
;
447+
448+
// tag::grokWithEscapeTripleQuotes-result[]
449+
@timestamp:keyword
450+
2023-01-23T12:15:00.000Z
451+
// end::grokWithEscapeTripleQuotes-result[]
452+
;
453+
454+
439455
grokWithDuplicateFieldNames
440456
// tag::grokWithDuplicateFieldNames[]
441457
FROM addresses
442458
| KEEP city.name, zip_code
443-
| GROK zip_code "%{WORD:zip_parts} %{WORD:zip_parts}"
459+
| GROK zip_code """%{WORD:zip_parts} %{WORD:zip_parts}"""
444460
// end::grokWithDuplicateFieldNames[]
445461
| SORT city.name
446462
;
@@ -456,7 +472,7 @@ Tokyo | 100-7014 | null
456472
basicDissect
457473
// tag::basicDissect[]
458474
ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1"
459-
| DISSECT a "%{date} - %{msg} - %{ip}"
475+
| DISSECT a """%{date} - %{msg} - %{ip}"""
460476
| KEEP date, msg, ip
461477
// end::basicDissect[]
462478
;
@@ -470,7 +486,7 @@ date:keyword | msg:keyword | ip:keyword
470486
dissectWithToDatetime
471487
// tag::dissectWithToDatetime[]
472488
ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1"
473-
| DISSECT a "%{date} - %{msg} - %{ip}"
489+
| DISSECT a """%{date} - %{msg} - %{ip}"""
474490
| KEEP date, msg, ip
475491
| EVAL date = TO_DATETIME(date)
476492
// end::dissectWithToDatetime[]
@@ -485,7 +501,7 @@ some text | 127.0.0.1 | 2023-01-23T12:15:00.000Z
485501
dissectRightPaddingModifier
486502
// tag::dissectRightPaddingModifier[]
487503
ROW message="1998-08-10T17:15:42 WARN"
488-
| DISSECT message "%{ts->} %{level}"
504+
| DISSECT message """%{ts->} %{level}"""
489505
// end::dissectRightPaddingModifier[]
490506
;
491507

@@ -498,7 +514,7 @@ message:keyword | ts:keyword | level:keyword
498514
dissectEmptyRightPaddingModifier#[skip:-8.11.2, reason:Support for empty right padding modifiers introduced in 8.11.2]
499515
// tag::dissectEmptyRightPaddingModifier[]
500516
ROW message="[1998-08-10T17:15:42] [WARN]"
501-
| DISSECT message "[%{ts}]%{->}[%{level}]"
517+
| DISSECT message """[%{ts}]%{->}[%{level}]"""
502518
// end::dissectEmptyRightPaddingModifier[]
503519
;
504520

@@ -511,7 +527,7 @@ ROW message="[1998-08-10T17:15:42] [WARN]"
511527
dissectAppendModifier
512528
// tag::dissectAppendModifier[]
513529
ROW message="john jacob jingleheimer schmidt"
514-
| DISSECT message "%{+name} %{+name} %{+name} %{+name}" APPEND_SEPARATOR=" "
530+
| DISSECT message """%{+name} %{+name} %{+name} %{+name}""" APPEND_SEPARATOR=" "
515531
// end::dissectAppendModifier[]
516532
;
517533

@@ -524,7 +540,7 @@ john jacob jingleheimer schmidt|john jacob jingleheimer schmidt
524540
dissectAppendWithOrderModifier
525541
// tag::dissectAppendWithOrderModifier[]
526542
ROW message="john jacob jingleheimer schmidt"
527-
| DISSECT message "%{+name/2} %{+name/4} %{+name/3} %{+name/1}" APPEND_SEPARATOR=","
543+
| DISSECT message """%{+name/2} %{+name/4} %{+name/3} %{+name/1}""" APPEND_SEPARATOR=","
528544
// end::dissectAppendWithOrderModifier[]
529545
;
530546

@@ -537,7 +553,7 @@ john jacob jingleheimer schmidt|schmidt,john,jingleheimer,jacob
537553
dissectNamedSkipKey
538554
// tag::dissectNamedSkipKey[]
539555
ROW message="1.2.3.4 - - 30/Apr/1998:22:00:52 +0000"
540-
| DISSECT message "%{clientip} %{?ident} %{?auth} %{@timestamp}"
556+
| DISSECT message """%{clientip} %{?ident} %{?auth} %{@timestamp}"""
541557
// end::dissectNamedSkipKey[]
542558
;
543559

@@ -550,7 +566,7 @@ message:keyword | clientip:keyword | @timestamp:keyword
550566
docsLike
551567
// tag::like[]
552568
FROM employees
553-
| WHERE first_name LIKE "?b*"
569+
| WHERE first_name LIKE """?b*"""
554570
| KEEP first_name, last_name
555571
// end::like[]
556572
| SORT first_name
@@ -566,7 +582,7 @@ Eberhardt |Terkki
566582
docsRlike
567583
// tag::rlike[]
568584
FROM employees
569-
| WHERE first_name RLIKE ".leja.*"
585+
| WHERE first_name RLIKE """.leja.*"""
570586
| KEEP first_name, last_name
571587
// end::rlike[]
572588
;

x-pack/plugin/esql/qa/testFixtures/src/main/resources/string.csv-spec

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1800,3 +1800,59 @@ warning:Line 1:29: java.lang.IllegalArgumentException: single-value function enc
18001800
x:keyword
18011801
null
18021802
;
1803+
1804+
1805+
likeEscapingSingleQuotes
1806+
// tag::likeEscapingSingleQuotes[]
1807+
ROW message = "foo * bar"
1808+
| WHERE message LIKE "foo \\* bar"
1809+
// end::likeEscapingSingleQuotes[]
1810+
;
1811+
1812+
// tag::likeEscapingSingleQuotes-result[]
1813+
message:keyword
1814+
foo * bar
1815+
// end::likeEscapingSingleQuotes-result[]
1816+
;
1817+
1818+
1819+
likeEscapingTripleQuotes
1820+
// tag::likeEscapingTripleQuotes[]
1821+
ROW message = "foo * bar"
1822+
| WHERE message LIKE """foo \* bar"""
1823+
// end::likeEscapingTripleQuotes[]
1824+
;
1825+
1826+
// tag::likeEscapingTripleQuotes-result[]
1827+
message:keyword
1828+
foo * bar
1829+
// end::likeEscapingTripleQuotes-result[]
1830+
;
1831+
1832+
1833+
rlikeEscapingSingleQuotes
1834+
// tag::rlikeEscapingSingleQuotes[]
1835+
ROW message = "foo ( bar"
1836+
| WHERE message RLIKE "foo \\( bar"
1837+
// end::rlikeEscapingSingleQuotes[]
1838+
;
1839+
1840+
// tag::rlikeEscapingSingleQuotes-result[]
1841+
message:keyword
1842+
foo ( bar
1843+
// end::rlikeEscapingSingleQuotes-result[]
1844+
;
1845+
1846+
1847+
rlikeEscapingTripleQuotes
1848+
// tag::rlikeEscapingTripleQuotes[]
1849+
ROW message = "foo ( bar"
1850+
| WHERE message RLIKE """foo \( bar"""
1851+
// end::rlikeEscapingTripleQuotes[]
1852+
;
1853+
1854+
// tag::rlikeEscapingTripleQuotes-result[]
1855+
message:keyword
1856+
foo ( bar
1857+
// end::rlikeEscapingTripleQuotes-result[]
1858+
;

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/RLike.java

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,23 @@ public class RLike extends org.elasticsearch.xpack.esql.core.expression.predicat
3333
Use `RLIKE` to filter data based on string patterns using using
3434
<<regexp-syntax,regular expressions>>. `RLIKE` usually acts on a field placed on
3535
the left-hand side of the operator, but it can also act on a constant (literal)
36-
expression. The right-hand side of the operator represents the pattern.""", examples = @Example(file = "docs", tag = "rlike"))
36+
expression. The right-hand side of the operator represents the pattern.""", detailedDescription = """
37+
Matching special characters (eg. `.`, `*`, `(`...) will require escaping.
38+
The escape character is backslash `\\`. Since also backslash is a special character in string literals,
39+
it will require further escaping.
40+
41+
[source.merge.styled,esql]
42+
----
43+
include::{esql-specs}/string.csv-spec[tag=rlikeEscapingSingleQuotes]
44+
----
45+
46+
To reduce the overhead of escaping, we suggest using triple quotes strings `\"\"\"`
47+
48+
[source.merge.styled,esql]
49+
----
50+
include::{esql-specs}/string.csv-spec[tag=rlikeEscapingTripleQuotes]
51+
----
52+
""", examples = @Example(file = "docs", tag = "rlike"))
3753
public RLike(
3854
Source source,
3955
@Param(name = "str", type = { "keyword", "text" }, description = "A literal value.") Expression value,

0 commit comments

Comments
 (0)