-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[DOCS][101] Create Elasticsearch basics section, refactor quickstarts section #112436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 18 commits
81aa97c
4682400
dcdcffb
ec87abb
cfbe986
c3984ea
1c57ecd
143b0eb
dadd793
fa818e3
4e815e4
e85565e
3f13270
5251740
1c729db
755e4fd
03500ef
cdef9f5
b758ff8
98a9780
00daf2b
e3fea62
0df8cbe
ddfdf1c
8314217
a2d8d66
ec62068
9629a55
4686858
303109f
754ddaa
d30b432
3a5dca6
216bede
3395671
dfb5916
085625f
c9fa2f0
6da5180
90e5ac0
e96aac0
bcadaa3
09a4120
b1f5144
a4d934c
f655978
a272bf7
0a0a074
a25b908
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,12 +2,12 @@ | |
== {es} basics | ||
|
||
This guide covers the core concepts you need to understand to get started with {es}. | ||
If you'd prefer to start working with {es} right away, set up a <<run-elasticsearch-locally,local dev environment>> and jump to <<quickstart,hands-on code examples>> . | ||
If you'd prefer to start working with {es} right away, set up a <<run-elasticsearch-locally,local development environment>> and jump to <<quickstart,hands-on code examples>>. | ||
|
||
This guide covers the following topics: | ||
|
||
* <<elasticsearch-intro-what-is-es>>: Learn about {es} and some of its main use cases. | ||
* <<elasticsearch-intro-deploy>>: Understand your options for deploying {es} in different environments, including a fast local dev setup. | ||
* <<elasticsearch-intro-deploy>>: Understand your options for deploying {es} in different environments, including a fast local development setup. | ||
* <<documents-indices>>: Understand {es}'s most important primitives and how it stores data. | ||
* <<es-ingestion-overview>>: Understand your options for ingesting data into {es}. | ||
* <<search-analyze>>: Understand your options for searching and analyzing data in {es}. | ||
|
@@ -49,18 +49,18 @@ Combined with https://www.elastic.co/kibana[{kib}], it powers the following Elas | |
**Observability** | ||
|
||
* *Logs, metrics, and traces*: Collect, store, and analyze logs, metrics, and traces from applications, systems, and services. | ||
* *Application performance monitoring (APM)*: Monitor and analyze application performance data. | ||
* *Application performance monitoring (APM)*: Monitor and analyze the performance of business-critical software applications. | ||
* *Real user monitoring (RUM)*: Monitor, quantify, and analyze user interactions with web applications. | ||
* *OpenTelemetry*: Elastic has full native support for OpenTelemetry data. | ||
* *OpenTelemetry*: Reuse your existing instrumentation to send telemetry data to the Elastic Stack using the OpenTelemetry standard. | ||
|
||
**Search** | ||
|
||
* *Full-text search*: Fast, relevant full-text search using inverted indexes, tokenization, and text analysis. | ||
* *Full-text search*: Build a fast, relevant full-text search solution using inverted indexes, tokenization, and text analysis. | ||
* *Vector database*: Store and search vectorized data, and create vector embeddings with built-in and third-party natural language processing (NLP) models. | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* *Semantic search*: Understand the intent and contextual meaning behind search queries using tools like synonyms, dense vector embeddings, and learned sparse query/document expansion. | ||
* *Semantic search*: Understand the intent and contextual meaning behind search queries using tools like synonyms, dense vector embeddings, and learned sparse query-document expansion. | ||
* *Hybrid search*: Combine full-text search with vector search using state-of-the-art ranking algorithms. | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* *Search applications*: Add hybrid search capabilities to apps or websites, or build enterprise search engines over your organization's internal data sources. | ||
* *Retrieval augmented generation (RAG)*: Use {es} as a retrieval engine to update and augment Generative AI models. | ||
* *Build search experiences*: Add hybrid search capabilities to apps or websites, or build enterprise search engines over your organization's internal data sources. | ||
* *Retrieval augmented generation (RAG)*: Use {es} as a retrieval engine to supplement generative AI models with more relevant, up-to-date, or proprietary data for a range of use cases. | ||
* *Geospatial search*: Search for locations and calculate spatial relationships using geospatial queries. | ||
|
||
**Security** | ||
|
@@ -80,7 +80,7 @@ You can deploy {es} in various ways. | |
|
||
**Quick start option** | ||
|
||
* <<run-elasticsearch-locally,*Local dev*>>: Get started quickly with a minimal local Docker setup for development and testing. | ||
* <<run-elasticsearch-locally,*Local development*>>: Get started quickly with a minimal local Docker setup for development and testing. | ||
|
||
**Hosted options** | ||
|
||
|
@@ -152,10 +152,10 @@ A simple {es} document might look like this: | |
|
||
[discrete] | ||
[[elasticsearch-intro-documents-fields-data-metadata]] | ||
==== Data and metadata | ||
==== Metadata fields | ||
|
||
An indexed document contains data and metadata. | ||
In {es}, <<mapping-fields,metadata fields>> are prefixed with an underscore. | ||
An indexed document contains data and metadata. <<mapping-fields,Metadata fields>> are system fields that store information about the documents. | ||
In {es}, metadata fields are prefixed with an underscore. | ||
For example, the following fields are metadata fields: | ||
|
||
* `_index`: The name of the index where the document is stored. | ||
|
@@ -170,7 +170,7 @@ A mapping defines the <<mapping-types,data type>> for each field, how the field | |
and how it should be stored. | ||
When adding documents to {es}, you have two options for mappings: | ||
|
||
* <<mapping-dynamic, Dynamic mapping>>: Let {es} automatically detect the data types and create the mappings for you. This is great for getting started quickly, but might yield suboptimal results for your specific use case due to automatic field type inference. | ||
* <<mapping-dynamic, Dynamic mapping>>: Let {es} automatically detect the data types and create the mappings for you. Dynamic mapping helps you get started quickly, but might yield suboptimal results for your specific use case due to automatic field type inference. | ||
* <<mapping-explicit, Explicit mapping>>: Define the mappings up front by specifying data types for each field. Recommended for production use cases, because you have full control over how your data is indexed to suit your specific use case. | ||
|
||
[TIP] | ||
|
@@ -184,34 +184,42 @@ This is useful when you have a mix of known and unknown fields in your data. | |
=== Add data to {es} | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the changes that you made to this page are so so valuable! I LOVE being able to understand how to get data into ES at a high level. would have been a game changer for March Shaina. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The origin of this content has been available for a couple of years, but somewhat hidden: https://www.elastic.co/guide/en/cloud/current/ec-cloud-ingest-data.html I've been using this starter content as the foundation of my ingest experience work, which means we'll have some duplication to sort out. We knew that there would be overlaps, so no big surprise there. We'll figure out the best way to communicate this info to users, and do the right thing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💯 Karen, it's just that the timestamped data decision tree is a little heavy for a brand-new user, who're not going to be thinking about data pipelines from the get go, but made sure to link to that page if that is what a reader is looking for. There's no specific reason I can tell why that original page is part of the cloud docs but again a topic for another day :) |
||
|
||
There are multiple ways to ingest data into {es}. | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
The option that you choose depends on whether you're working with timestamped data, non-timestamped data, where the data is coming from, its complexity, and more. | ||
The option that you choose depends on whether you're working with timestamped data or non-timestamped data, where the data is coming from, its complexity, and more. | ||
|
||
[TIP] | ||
==== | ||
You can load {kibana-ref}/connect-to-elasticsearch.html#_add_sample_data[sample data] into your {es} cluster using {kib}, to get started quickly. | ||
==== | ||
|
||
[discrete] | ||
[[es-ingestion-overview-general-content]] | ||
==== General content | ||
|
||
General content is data that does not have a timestamp. | ||
This could be data like vector embeddings, website content, product catalogs, or more. | ||
This could be data like vector embeddings, website content, product catalogs, and more. | ||
For general content, you have the following options for adding data to {es} indices: | ||
|
||
* <<docs,API>>: Use the {es} <<docs,Document APIs>> to index documents directly, using the Dev Tools {kibana-ref}/console-kibana.html[Console], cURL. | ||
** You can use https://www.elastic.co/guide/en/elasticsearch/client/index.html[{es} programming language clients] to index documents in your programming language of choice. For Python devs, check out the `elasticsearch-labs` repo for various https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/search/python-examples[example notebooks]. | ||
* {kibana-ref}/connect-to-elasticsearch.html#upload-data-kibana[File upload]: Use the {kib} file uploader to upload and index CSV, JSON, and log files. | ||
* {kibana-ref}/connect-to-elasticsearch.html#_add_sample_data[Sample data]: Load sample data sets into your {es} cluster using {kib}. | ||
* <<docs,API>>: Use the {es} <<docs,Document APIs>> to index documents directly, using the Dev Tools {kibana-ref}/console-kibana.html[Console], or cURL. | ||
+ | ||
If you're building a website or app, then you can call Elasticsearch APIs using an https://www.elastic.co/guide/en/elasticsearch/client/index.html[{es} client] in the programming language of your choice. If you use the Python client, then check out the `elasticsearch-labs` repo for various https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/search/python-examples[example notebooks]. | ||
* {kibana-ref}/connect-to-elasticsearch.html#upload-data-kibana[File upload]: Use the {kib} file uploader to index single files for one-off testing and exploration. The GUI guides you through setting up your index and field mappings. | ||
* https://github.com/elastic/crawler[Web crawler]: Extract and index web page content into {es} documents. | ||
* {enterprise-search-ref}/connectors.html[Connectors]: Sync data from various third-party data sources to create searchable, read-only replicas in {es}. | ||
|
||
[discrete] | ||
[[es-ingestion-overview-timestamped]] | ||
==== Timestamped data | ||
|
||
Timestamped data in {es} refers to datasets that include a timestamp field, typically named `@timestamp` when using the https://www.elastic.co/guide/en/ecs/current/ecs-reference.html[Elastic Common Schema (ECS)]. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
This could be data like logs, metrics, and traces. | ||
|
||
For timestamped data, you have the following options for adding data to {es} data streams: | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* {fleet-guide}/fleet-overview.html[Elastic Agent and Fleet]: The preferred way to index timestamped data. Each Elastic Agent based integration includes default ingestion rules, dashboards, and visualizations to start analyzing your data right away. | ||
You can use the Fleet UI in {kib} to centrally manage Elastic Agents and their policies. | ||
* {beats-ref}/beats-reference.html[Beats]: If your data source isn't supported by Elastic Agent, use Beats to collect and ship data. You install a separate Beat for each type of data to collect. | ||
* {beats-ref}/beats-reference.html[Beats]: If your data source isn't supported by Elastic Agent, use Beats to collect and ship data to Elasticsearch. You install a separate Beat for each type of data to collect. | ||
* {logstash-ref}/introduction.html[Logstash]: Logstash is an open source data collection engine with real-time pipelining capabilities that supports a wide variety of data sources. You might use this option because neither Elastic Agent nor Beats supports your data source. You can also use Logstash to persist incoming data, or if you need to send the data to multiple destinations. | ||
* {cloud}/ec-ingest-guides.html[Language clients]: The linked tutorials demonstrate how to use {es} programming language clients to ingest data from an application. (In these examples, {es} is running on Elastic Cloud, but the same principles apply to any {es} deployment.) | ||
* {cloud}/ec-ingest-guides.html[Language clients]: The linked tutorials demonstrate how to use {es} programming language clients to ingest data from an application. In these examples, {es} is running on Elastic Cloud, but the same principles apply to any {es} deployment. | ||
|
||
[TIP] | ||
==== | ||
|
@@ -226,19 +234,21 @@ You can use {es} as a basic document store to retrieve documents and their | |
metadata. | ||
However, the real power of {es} comes from its advanced search and analytics capabilities. | ||
|
||
You'll use a combination of an API endpoint and a query language to interact with your data. | ||
|
||
[discrete] | ||
[[search-analyze-rest-api]] | ||
==== Rest API | ||
==== REST API | ||
|
||
Use {es}'s REST API to manage your cluster, and to index | ||
Use REST APIs to manage your {es} cluster, and to index | ||
and search your data. | ||
For testing purposes, you can submit requests | ||
directly from the command line or through the Dev Tools {kibana-ref}/console-kibana.html[Console] in {kib}. | ||
From your applications, you can use an | ||
https://www.elastic.co/guide/en/elasticsearch/client/index.html[{es} client] | ||
From your applications, you can use a | ||
https://www.elastic.co/guide/en/elasticsearch/client/index.html[client] | ||
in your programming language of choice. | ||
|
||
Refer to <<getting-started,first steps with Elasticsearch>> for a hands-on example of using the REST API, adding data to {es}, and running basic searches. | ||
Refer to <<getting-started,first steps with Elasticsearch>> for a hands-on example of using the `_search` endpoint, adding data to {es}, and running basic searches in Query DSL syntax. | ||
|
||
[discrete] | ||
[[search-analyze-query-languages]] | ||
|
@@ -249,7 +259,9 @@ Refer to <<getting-started,first steps with Elasticsearch>> for a hands-on examp | |
*Query DSL* is the primary query language for {es} today. | ||
|
||
*{esql}* is a new piped query language and compute engine which was first added in version *8.11*. | ||
It does not yet support all the features of Query DSL, like full-text search and semantic search. | ||
|
||
{esql} does not yet support all the features of Query DSL, like full-text search and semantic search. | ||
Look forward to new {esql} features and functionalities in each release. | ||
|
||
Refer to <<search-analyze-query-languages>> for a full overview of the query languages available in {es}. | ||
|
||
|
@@ -260,6 +272,8 @@ Refer to <<search-analyze-query-languages>> for a full overview of the query lan | |
<<query-dsl, Query DSL>> is a full-featured JSON-style query language that enables complex searching, filtering, and aggregations. | ||
It is the original and most powerful query language for {es} today. | ||
|
||
The <<search-your-data, `_search` endpoint>> accepts queries written in Query DSL syntax. | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[discrete] | ||
[[search-analyze-query-dsl-search-filter]] | ||
====== Search and filter with Query DSL | ||
|
@@ -272,7 +286,7 @@ Query DSL support a wide range of search techniques, including the following: | |
* <<knn-search,*Vector search*>>: Search for similar dense vectors using the kNN algorithm for embeddings generated outside of {es}. | ||
* <<geo-queries,*Geospatial search*>>: Search for locations and calculate spatial relationships using geospatial queries. | ||
|
||
Learn about the full range of queries supported by the <<query-dsl,Query DSL>>. | ||
Learn about the full range of queries supported by <<query-dsl,Query DSL>>. | ||
|
||
You can also filter data using Query DSL. | ||
Filters enable you to include or exclude documents by retrieving documents that match specific field-level criteria. | ||
|
@@ -286,7 +300,7 @@ A query that uses the `filter` parameter indicates <<filter-context,filter conte | |
Aggregrations enable you to build complex summaries of your data and gain | ||
insight into key metrics, patterns, and trends. | ||
|
||
Because aggregations leverage the same data-structures used for search, they are | ||
Because aggregations leverage the same data structures used for search, they are | ||
also very fast. This enables you to analyze and visualize your data in real time. | ||
You can search documents, filter results, and perform analytics at the same time, on the same | ||
data, in a single request. | ||
|
@@ -310,6 +324,9 @@ Learn more in <<run-an-agg,Run an aggregation>>. | |
<<esql,Elasticsearch Query Language ({esql})>> is a piped query language for filtering, transforming, and analyzing data. | ||
{esql} is built on top of a new compute engine, where search, aggregation, and transformation functions are | ||
directly executed within {es} itself. | ||
{esql} syntax can also be used within various {kib} tools. | ||
|
||
The <<esql-rest,`_query` endpoint>> accepts queries written in {esql} syntax. | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Today, it supports a subset of the features available in Query DSL, like aggregations, filters, and transformations. | ||
It does not yet support full-text search or semantic search. | ||
|
@@ -320,9 +337,7 @@ Learn more in <<esql-getting-started,Getting started with {esql}>>, or try https | |
|
||
[discrete] | ||
[[search-analyze-data-query-languages-table]] | ||
==== Query languages overview | ||
|
||
// TODO: do I belong here? | ||
==== List of available query languages | ||
|
||
The following table summarizes all available {es} query languages, to help you choose the right one for your use case. | ||
|
||
|
@@ -331,8 +346,8 @@ The following table summarizes all available {es} query languages, to help you c | |
| Name | Description | Use cases | API endpoint | ||
shainaraskas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
| <<query-dsl,Query DSL>> | ||
| Primary query language for {es}. Powerful and flexible JSON-style language that enables complex queries. | ||
| Supports full-text search, semantic search, keyword search, filtering, aggregations, and more. | ||
| The primary query language for {es}. A powerful and flexible JSON-style language that enables complex queries. | ||
| Full-text search, semantic search, keyword search, filtering, aggregations, and more. | ||
| <<search-search,`_search`>> | ||
|
||
|
||
|
@@ -345,7 +360,7 @@ Does not yet support full-text search. | |
|
||
|
||
| <<eql,EQL>> | ||
| Event Query Language (EQL) is a query language for event-based time series data. Data must contain an `@timestamp` field to use EQL. | ||
| Event Query Language (EQL) is a query language for event-based time series data. Data must contain the `@timestamp` field to use EQL. | ||
| Designed for the threat hunting security use case. | ||
| <<eql-apis,`_eql`>> | ||
|
||
|
@@ -354,6 +369,11 @@ Does not yet support full-text search. | |
| Enables users familiar with SQL to query {es} data using familiar syntax for BI and reporting. | ||
| <<sql-apis,`_sql`>> | ||
|
||
| {kibana-ref}/kuery-query.html[Kibana Query Language (KQL)] | ||
| Kibana Query Language (KQL) is a text-based query language for filtering data that is only available in the {kib} UI. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Use KQL to filter documents where a value for a field exists, matches a given value, or is within a given range. | ||
| N/A | ||
|
||
|=== | ||
|
||
// New html page | ||
|
Uh oh!
There was an error while loading. Please reload this page.