Skip to content

Commit 89faf44

Browse files
[DOCS] Add ES|QL 'getting started' code snippets to CSV tests (#102653)
* [DOCS] Add ES|QL 'getting started' code snippets to CSV tests * Change dots in columns names into underscores * Add LIMIT 0 to ENRICH test * Move code snippets out of docs.csv-spec * Replace code snippets by includes * Add missing semicolon
1 parent 56170b4 commit 89faf44

File tree

16 files changed

+313
-53
lines changed

16 files changed

+313
-53
lines changed

docs/reference/esql/esql-get-started.asciidoc

Lines changed: 20 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ This query returns up to 500 documents from the `sample_data` index:
3939

4040
[source,esql]
4141
----
42-
FROM sample_data
42+
include::{esql-specs}/docs.csv-spec[tag=gs-from]
4343
----
4444

4545
Each column corresponds to a field, and can be accessed by the name of that
@@ -52,7 +52,7 @@ previous one:
5252
5353
[source,esql]
5454
----
55-
from sample_data
55+
include::{esql-specs}/docs.csv-spec[tag=gs-from-lowercase]
5656
----
5757
====
5858

@@ -73,8 +73,7 @@ that are returned, up to a maximum of 10,000 rows:
7373

7474
[source,esql]
7575
----
76-
FROM sample_data
77-
| LIMIT 3
76+
include::{esql-specs}/docs.csv-spec[tag=gs-limit]
7877
----
7978

8079
[TIP]
@@ -84,7 +83,7 @@ have to. The following query is identical to the previous one:
8483
8584
[source,esql]
8685
----
87-
FROM sample_data | LIMIT 3
86+
include::{esql-specs}/docs.csv-spec[tag=gs-limit-one-line]
8887
----
8988
====
9089

@@ -100,8 +99,7 @@ sort rows on one or more columns:
10099

101100
[source,esql]
102101
----
103-
FROM sample_data
104-
| SORT @timestamp DESC
102+
include::{esql-specs}/docs.csv-spec[tag=gs-sort]
105103
----
106104

107105
[discrete]
@@ -113,16 +111,14 @@ events with a duration longer than 5ms:
113111

114112
[source,esql]
115113
----
116-
FROM sample_data
117-
| WHERE event.duration > 5000000
114+
include::{esql-specs}/where.csv-spec[tag=gs-where]
118115
----
119116

120117
`WHERE` supports several <<esql-operators,operators>>. For example, you can use <<esql-like-operator>> to run a wildcard query against the `message` column:
121118

122119
[source,esql]
123120
----
124-
FROM sample_data
125-
| WHERE message LIKE "Connected*"
121+
include::{esql-specs}/where-like.csv-spec[tag=gs-like]
126122
----
127123

128124
[discrete]
@@ -149,9 +145,7 @@ result set to 3 rows:
149145

150146
[source,esql]
151147
----
152-
FROM sample_data
153-
| SORT @timestamp DESC
154-
| LIMIT 3
148+
include::{esql-specs}/docs.csv-spec[tag=gs-chaining]
155149
----
156150

157151
NOTE: The order of processing commands is important. First limiting the result
@@ -169,8 +163,7 @@ other words: `event.duration` converted from nanoseconds to milliseconds.
169163

170164
[source,esql]
171165
----
172-
FROM sample_data
173-
| EVAL duration_ms = event.duration / 1000000.0
166+
include::{esql-specs}/eval.csv-spec[tag=gs-eval]
174167
----
175168

176169
`EVAL` supports several <<esql-functions,functions>>. For example, to round a
@@ -179,8 +172,7 @@ number to the closest number with the specified number of digits, use the
179172

180173
[source,esql]
181174
----
182-
FROM sample_data
183-
| EVAL duration_ms = ROUND(event.duration / 1000000.0, 1)
175+
include::{esql-specs}/eval.csv-spec[tag=gs-round]
184176
----
185177

186178
[discrete]
@@ -193,25 +185,22 @@ example, the median duration:
193185

194186
[source,esql]
195187
----
196-
FROM sample_data
197-
| STATS median_duration = MEDIAN(event.duration)
188+
include::{esql-specs}/stats.csv-spec[tag=gs-stats]
198189
----
199190

200191
You can calculate multiple stats with one command:
201192

202193
[source,esql]
203194
----
204-
FROM sample_data
205-
| STATS median_duration = MEDIAN(event.duration), max_duration = MAX(event.duration)
195+
include::{esql-specs}/stats.csv-spec[tag=gs-two-stats]
206196
----
207197

208198
Use `BY` to group calculated stats by one or more columns. For example, to
209199
calculate the median duration per client IP:
210200

211201
[source,esql]
212202
----
213-
FROM sample_data
214-
| STATS median_duration = MEDIAN(event.duration) BY client.ip
203+
include::{esql-specs}/stats.csv-spec[tag=gs-stats-by]
215204
----
216205

217206
[discrete]
@@ -227,30 +216,22 @@ For example, to create hourly buckets for the data on October 23rd:
227216

228217
[source,esql]
229218
----
230-
FROM sample_data
231-
| KEEP @timestamp
232-
| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
219+
include::{esql-specs}/date.csv-spec[tag=gs-auto_bucket]
233220
----
234221

235222
Combine `AUTO_BUCKET` with <<esql-stats-by>> to create a histogram. For example,
236223
to count the number of events per hour:
237224

238225
[source,esql]
239226
----
240-
FROM sample_data
241-
| KEEP @timestamp, event.duration
242-
| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
243-
| STATS COUNT(*) BY bucket
227+
include::{esql-specs}/date.csv-spec[tag=gs-auto_bucket-stats-by]
244228
----
245229

246230
Or the median duration per hour:
247231

248232
[source,esql]
249233
----
250-
FROM sample_data
251-
| KEEP @timestamp, event.duration
252-
| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
253-
| STATS median_duration = MEDIAN(event.duration) BY bucket
234+
include::{esql-specs}/date.csv-spec[tag=gs-auto_bucket-stats-by-median]
254235
----
255236

256237
[discrete]
@@ -273,10 +254,7 @@ command:
273254

274255
[source,esql]
275256
----
276-
FROM sample_data
277-
| KEEP @timestamp, client.ip, event.duration
278-
| EVAL client.ip = TO_STRING(client.ip)
279-
| ENRICH clientip_policy ON client.ip WITH env
257+
include::{esql-specs}/enrich.csv-spec[tag=gs-enrich]
280258
----
281259

282260
You can use the new `env` column that's added by the `ENRICH` command in
@@ -285,11 +263,7 @@ environment:
285263

286264
[source,esql]
287265
----
288-
FROM sample_data
289-
| KEEP @timestamp, client.ip, event.duration
290-
| EVAL client.ip = TO_STRING(client.ip)
291-
| ENRICH clientip_policy ON client.ip WITH env
292-
| STATS median_duration = MEDIAN(event.duration) BY env
266+
include::{esql-specs}/enrich.csv-spec[tag=gs-enrich-stats-by]
293267
----
294268

295269
For more about data enrichment with {esql}, refer to <<esql-enrich-data>>.
@@ -321,8 +295,7 @@ string, you can use the following `DISSECT` command:
321295

322296
[source,esql]
323297
----
324-
FROM sample_data
325-
| DISSECT message "Connected to %{server.ip}"
298+
include::{esql-specs}/dissect.csv-spec[tag=gs-dissect]
326299
----
327300

328301
This adds a `server.ip` column to those rows that have a `message` that matches
@@ -334,10 +307,7 @@ has accepted:
334307

335308
[source,esql]
336309
----
337-
FROM sample_data
338-
| WHERE STARTS_WITH(message, "Connected to")
339-
| DISSECT message "Connected to %{server.ip}"
340-
| STATS COUNT(*) BY server.ip
310+
include::{esql-specs}/dissect.csv-spec[tag=gs-dissect-stats-by]
341311
----
342312

343313
For more about data processing with {esql}, refer to

x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ public class CsvTestsDataLoader {
5252
private static final TestsDataset APPS = new TestsDataset("apps", "mapping-apps.json", "apps.csv");
5353
private static final TestsDataset LANGUAGES = new TestsDataset("languages", "mapping-languages.json", "languages.csv");
5454
private static final TestsDataset UL_LOGS = new TestsDataset("ul_logs", "mapping-ul_logs.json", "ul_logs.csv");
55+
private static final TestsDataset SAMPLE_DATA = new TestsDataset("sample_data", "mapping-sample_data.json", "sample_data.csv");
56+
private static final TestsDataset CLIENT_IPS = new TestsDataset("clientips", "mapping-clientips.json", "clientips.csv");
5557
private static final TestsDataset AIRPORTS = new TestsDataset("airports", "mapping-airports.json", "airports.csv");
5658
private static final TestsDataset AIRPORTS_WEB = new TestsDataset("airports_web", "mapping-airports_web.json", "airports_web.csv");
5759

@@ -66,15 +68,20 @@ public class CsvTestsDataLoader {
6668
LANGUAGES,
6769
UL_LOGS.indexName,
6870
UL_LOGS,
71+
SAMPLE_DATA.indexName,
72+
SAMPLE_DATA,
73+
CLIENT_IPS.indexName,
74+
CLIENT_IPS,
6975
AIRPORTS.indexName,
7076
AIRPORTS,
7177
AIRPORTS_WEB.indexName,
7278
AIRPORTS_WEB
7379
);
7480

75-
private static final EnrichConfig LANGUAGES_ENRICH = new EnrichConfig("languages_policy", "enricy-policy-languages.json");
81+
private static final EnrichConfig LANGUAGES_ENRICH = new EnrichConfig("languages_policy", "enrich-policy-languages.json");
82+
private static final EnrichConfig CLIENT_IPS_ENRICH = new EnrichConfig("clientip_policy", "enrich-policy-clientips.json");
7683

77-
public static final List<EnrichConfig> ENRICH_POLICIES = List.of(LANGUAGES_ENRICH);
84+
public static final List<EnrichConfig> ENRICH_POLICIES = List.of(LANGUAGES_ENRICH, CLIENT_IPS_ENRICH);
7885

7986
/**
8087
* <p>
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
client_ip:keyword,env:keyword
2+
172.21.0.5,Development
3+
172.21.2.113,QA
4+
172.21.2.162,QA
5+
172.21.3.15,Production
6+
172.21.3.16,Production

x-pack/plugin/esql/qa/testFixtures/src/main/resources/date.csv-spec

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -725,3 +725,45 @@ birth_date:datetime
725725
1952-02-27T00:00:00.000Z
726726
1953-04-21T00:00:00.000Z
727727
;
728+
729+
docsGettingStartedAutoBucket
730+
// tag::gs-auto_bucket[]
731+
FROM sample_data
732+
| KEEP @timestamp
733+
| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
734+
// end::gs-auto_bucket[]
735+
| LIMIT 0
736+
;
737+
738+
@timestamp:date | bucket:date
739+
;
740+
741+
docsGettingStartedAutoBucketStatsBy
742+
// tag::gs-auto_bucket-stats-by[]
743+
FROM sample_data
744+
| KEEP @timestamp, event_duration
745+
| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
746+
| STATS COUNT(*) BY bucket
747+
// end::gs-auto_bucket-stats-by[]
748+
| SORT bucket
749+
;
750+
751+
COUNT(*):long | bucket:date
752+
2 |2023-10-23T12:00:00.000Z
753+
5 |2023-10-23T13:00:00.000Z
754+
;
755+
756+
docsGettingStartedAutoBucketStatsByMedian
757+
// tag::gs-auto_bucket-stats-by-median[]
758+
FROM sample_data
759+
| KEEP @timestamp, event_duration
760+
| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
761+
| STATS median_duration = MEDIAN(event_duration) BY bucket
762+
// end::gs-auto_bucket-stats-by-median[]
763+
| SORT bucket
764+
;
765+
766+
median_duration:double | bucket:date
767+
3107561.0 |2023-10-23T12:00:00.000Z
768+
1756467.0 |2023-10-23T13:00:00.000Z
769+
;

x-pack/plugin/esql/qa/testFixtures/src/main/resources/dissect.csv-spec

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,33 @@ emp_no:integer | a:keyword | b:keyword | c:keyword
159159
10006 | [Principal, Senior] | [Support, Team] | [Engineer, Lead]
160160
;
161161

162+
docsGettingStartedDissect
163+
// tag::gs-dissect[]
164+
FROM sample_data
165+
| DISSECT message "Connected to %{server_ip}"
166+
// end::gs-dissect[]
167+
| LIMIT 0
168+
;
169+
170+
@timestamp:date | client_ip:ip | event_duration:long | message:keyword | server_ip:keyword
171+
;
172+
173+
docsGettingStartedDissectStatsBy
174+
// tag::gs-dissect-stats-by[]
175+
FROM sample_data
176+
| WHERE STARTS_WITH(message, "Connected to")
177+
| DISSECT message "Connected to %{server_ip}"
178+
| STATS COUNT(*) BY server_ip
179+
// end::gs-dissect-stats-by[]
180+
| SORT server_ip
181+
;
182+
183+
COUNT(*):long | server_ip:keyword
184+
1 |10.1.0.1
185+
1 |10.1.0.2
186+
1 |10.1.0.3
187+
;
188+
162189
emptyPattern#[skip:-8.11.99]
163190
ROW a="b c d"| DISSECT a "%{b} %{} %{d}";
164191

x-pack/plugin/esql/qa/testFixtures/src/main/resources/docs.csv-spec

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -650,4 +650,68 @@ FROM employees
650650
first_name:keyword | last_name:keyword
651651
Alejandro |McAlpine
652652
// end::rlike-result[]
653-
;
653+
;
654+
655+
docsGettingStartedFrom
656+
// tag::gs-from[]
657+
FROM sample_data
658+
// end::gs-from[]
659+
| LIMIT 0
660+
;
661+
662+
@timestamp:date | client_ip:ip | event_duration:long | message:keyword
663+
;
664+
665+
docsGettingStartedFromLowercase
666+
// tag::gs-from-lowercase[]
667+
from sample_data
668+
// end::gs-from-lowercase[]
669+
| LIMIT 0
670+
;
671+
672+
@timestamp:date | client_ip:ip | event_duration:long | message:keyword
673+
;
674+
675+
docsGettingStartedLimit
676+
// tag::gs-limit[]
677+
FROM sample_data
678+
| LIMIT 3
679+
// end::gs-limit[]
680+
| LIMIT 0
681+
;
682+
683+
@timestamp:date | client_ip:ip | event_duration:long | message:keyword
684+
;
685+
686+
docsGettingStartedLimitOneLine
687+
// tag::gs-limit-one-line[]
688+
FROM sample_data | LIMIT 3
689+
// end::gs-limit-one-line[]
690+
| LIMIT 0
691+
;
692+
693+
@timestamp:date | client_ip:ip | event_duration:long | message:keyword
694+
;
695+
696+
docsGettingStartedSort
697+
// tag::gs-sort[]
698+
FROM sample_data
699+
| SORT @timestamp DESC
700+
// end::gs-sort[]
701+
| LIMIT 0
702+
;
703+
704+
@timestamp:date | client_ip:ip | event_duration:long | message:keyword
705+
;
706+
707+
docsGettingStartedChaining
708+
// tag::gs-chaining[]
709+
FROM sample_data
710+
| SORT @timestamp DESC
711+
| LIMIT 3
712+
// end::gs-chaining[]
713+
| LIMIT 0
714+
;
715+
716+
@timestamp:date | client_ip:ip | event_duration:long | message:keyword
717+
;

0 commit comments

Comments
 (0)