-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[DOCS] Rewrite "What is Elasticsearch?" (Part 1) #112213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
85f1d7a
1c2eb0d
8b794e9
f138524
3062ac9
56cfb44
69612ac
84575cc
757f9c1
bde3cce
832eb3f
9ec6718
dcf5050
238ec7d
acc90b5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,42 +1,77 @@ | ||
[[elasticsearch-intro]] | ||
== What is {es}? | ||
_**You know, for search (and analysis)**_ | ||
|
||
{es} is the distributed search and analytics engine at the heart of | ||
the {stack}. {ls} and {beats} facilitate collecting, aggregating, and | ||
enriching your data and storing it in {es}. {kib} enables you to | ||
interactively explore, visualize, and share insights into your data and manage | ||
and monitor the stack. {es} is where the indexing, search, and analysis | ||
magic happens. | ||
|
||
{es} provides near real-time search and analytics for all types of data. Whether you | ||
have structured or unstructured text, numerical data, or geospatial data, | ||
{es} can efficiently store and index it in a way that supports fast searches. | ||
You can go far beyond simple data retrieval and aggregate information to discover | ||
trends and patterns in your data. And as your data and query volume grows, the | ||
distributed nature of {es} enables your deployment to grow seamlessly right | ||
along with it. | ||
|
||
While not _every_ problem is a search problem, {es} offers speed and flexibility | ||
to handle data in a wide variety of use cases: | ||
|
||
* Add a search box to an app or website | ||
* Store and analyze logs, metrics, and security event data | ||
* Use machine learning to automatically model the behavior of your data in real | ||
time | ||
* Use {es} as a vector database to create, store, and search vector embeddings | ||
* Automate business workflows using {es} as a storage engine | ||
* Manage, integrate, and analyze spatial information using {es} as a geographic | ||
information system (GIS) | ||
* Store and process genetic data using {es} as a bioinformatics research tool | ||
|
||
We’re continually amazed by the novel ways people use search. But whether | ||
your use case is similar to one of these, or you're using {es} to tackle a new | ||
problem, the way you work with your data, documents, and indices in {es} is | ||
the same. | ||
|
||
https://github.com/elastic/elasticsearch[{es}] is a distributed, RESTful search and analytics engine, scalable data store, and vector database built in Java on top of the Apache Lucene library. | ||
Use {es} to search, index, store, and analyze data of all shapes and sizes in near real-time. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
[TIP] | ||
==== | ||
{es} has a lot of features. Explore the full list on the https://www.elastic.co/elasticsearch/features[product webpage^]. | ||
==== | ||
|
||
{es} is the heart of the the <<elasticsearch-intro-elastic-stack,Elastic Stack>> and powers the Elastic https://www.elastic.co/enterprise-search[Search], https://www.elastic.co/observability[Observability] and https://www.elastic.co/security[Security] solutions. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
{es} is used for a wide and growing range of use cases. Here are a few examples: | ||
|
||
* *Monitor log and event data*. Store and analyze logs, metrics, and security event data for operational insights and SIEM. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
* *Build search applications*. Add search capabilities to apps or websites and build enterprise search engines over your organization's internal data sources. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
* *Vector database*. Store and search vectorized data, create vector embeddings with built-in and third-party NLP models. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
* *Retrieval augmented generation (RAG)*. Use {es} as a retrieval engine to augment Generative AI models. | ||
* *Application and security monitoring*. Monitor and analyze application performance and security data effectively. | ||
* *Machine learning*. Use machine learning to automatically model the behavior of your data in real time. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
This is just a sample of search, observability, and security use cases enabled by {es}. | ||
Refer to our https://www.elastic.co/customers/success-stories[customer success stories] for concrete examples across a range of industry verticals. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
// Link to demos, search labs chatbots | ||
|
||
[discrete] | ||
[[elasticsearch-intro-elastic-stack]] | ||
.What is the Elastic Stack? | ||
******************************* | ||
The Elastic Stack refers to the suite of products enabled by {es}: | ||
|
||
|
||
* https://www.elastic.co/guide/en/kibana/current/index.html[Kibana]. A UI for visualizing and exploring data in {es}. | ||
* https://www.elastic.co/guide/en/elasticsearch/client/index.html[Client libraries]. Work with {es} in your preferred programming language. | ||
* https://www.elastic.co/guide/en/logstash/current/introduction.html[Logstash]. A server-side data processing pipeline for ingesting and transforming data from multiple sources and indexing into {es}. | ||
* https://www.elastic.co/guide/en/fleet/current/fleet-overview.html[Fleet and Elastic Agent.] Elastic Agents is a single, unified way to add monitoring for logs, metrics, and other types of data to a host. Fleet is a central place to configure and monitor your Elastic Agents. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
* https://www.elastic.co/guide/en/beats/libbeat/current/beats-reference.html[Beats]. Lightweight data shippers for sending data from edge machines to {es}. | ||
* https://www.elastic.co/guide/en/observability/current/apm.html[APM]. Monitor the performance of your applications. | ||
* https://www.elastic.co/guide/en/elasticsearch/hadoop/current/float.html[{es} Hadoop]. Use {es} as a Hadoop input/output format. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/stack-components.html[Learn more about the Elastic Stack]. | ||
******************************* | ||
// TODO: Remove once we've moved Stack Overview to a subpage? | ||
|
||
[discrete] | ||
[[elasticsearch-intro-deploy]] | ||
=== Deployment options | ||
|
||
To use {es}, you need a running instance of the {es} service. | ||
You can deploy {es} in various ways: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we clearer "some of these cost money and some of these are free" messaging. ECE and ECK are especially confusing in comparison to self-managed. can also be a follow-up There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's OK to just list the options and let devs decide. Free trials imply paid services. And honestly I don't think anyone "gets started" with ES by using ECE or ECK. I like concision here and we have links to learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we could categorize self-managed, ECE, and ECK into an "advanced deployment" options section There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that would work for me, although self-managed is also the beginner/builder path There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that's why local dev has its own category and has top billing 😄 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah my bad |
||
|
||
* https://elastic.co/guide/en/cloud/current/ec-getting-started.html[*Elastic Cloud*]. {es} is available as part of our hosted Elastic Stack offering, deployed in the cloud with your provider of choice. Sign up for a https://cloud.elastic.co/registration[14 day free trial]. | ||
* https://elastic.co/guide/en/cloud-enterprise/current/Elastic-Cloud-Enterprise-overview.html[*Elastic Cloud Enterprise*]. Deploy Elastic Cloud on public or private clouds, virtual machines, or your own premises. | ||
* https://elastic.co/guide/en/cloud-on-k8s/current/k8s-overview.html[*Elastic Cloud on Kubernetes*]. Deploy Elastic Cloud on Kubernetes. | ||
* https://www.elastic.co/docs/current/serverless[*Elastic Cloud Serverless* (technical preview)]. Create serverless projects for autoscaled and fully-managed {es} deployments. Sign up for a https://cloud.elastic.co/serverless-registration[14 day free trial]. | ||
* <<elasticsearch-deployment-options,*Self managed*>>. Install, configure, and run {es} on your own premises. | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
+ | ||
[TIP] | ||
==== | ||
If you just want to get started quickly with a minimal local setup, refer to <<run-elasticsearch-locally,Run {es} locally>>. | ||
==== | ||
leemthompo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
[discrete] | ||
[[elasticsearch-next-steps]] | ||
=== Learn more | ||
|
||
* <<getting-started, Quickstart>>. A beginner's guide to deploying your first {es} instance, indexing data, and running queries. | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* https://elastic.co/webinars/getting-started-elasticsearch[Webinar: Introduction to {es}]. Register for our live webinars to learn directly from {es} experts. | ||
* https://www.elastic.co/search-labs[Elastic Search Labs]. Tutorials and blogs that explore AI-powered search using the latest {es} features. | ||
** Follow our tutorial https://www.elastic.co/search-labs/tutorials/search-tutorial/welcome[to build a hybrid search solution in Python]. | ||
** Check out the https://github.com/elastic/elasticsearch-labs?tab=readme-ov-file#elasticsearch-examples--apps[`elasticsearch-labs` repository] for a range of Python notebooks and apps for various use cases. | ||
|
||
[[documents-indices]] | ||
=== Data in: documents and indices | ||
=== Documents and indices | ||
|
||
{es} is a distributed document store. Instead of storing information as rows of | ||
columnar data, {es} stores complex data structures that have been serialized | ||
|
@@ -65,8 +100,7 @@ behavior makes it easy to index and explore your data--just start | |
indexing documents and {es} will detect and map booleans, floating point and | ||
integer values, dates, and strings to the appropriate {es} data types. | ||
|
||
Ultimately, however, you know more about your data and how you want to use it | ||
than {es} can. You can define rules to control dynamic mapping and explicitly | ||
You can define rules to control dynamic mapping and explicitly | ||
define mappings to take full control of how fields are stored and indexed. | ||
|
||
Defining your own mappings enables you to: | ||
|
@@ -88,94 +122,6 @@ The analysis chain that is applied to a full-text field during indexing is also | |
used at search time. When you query a full-text field, the query text undergoes | ||
the same analysis before the terms are looked up in the index. | ||
|
||
[[search-analyze]] | ||
=== Information out: search and analyze | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
While you can use {es} as a document store and retrieve documents and their | ||
metadata, the real power comes from being able to easily access the full suite | ||
of search capabilities built on the Apache Lucene search engine library. | ||
|
||
{es} provides a simple, coherent REST API for managing your cluster and indexing | ||
and searching your data. For testing purposes, you can easily submit requests | ||
directly from the command line or through the Developer Console in {kib}. From | ||
your applications, you can use the | ||
https://www.elastic.co/guide/en/elasticsearch/client/index.html[{es} client] | ||
for your language of choice: Java, JavaScript, Go, .NET, PHP, Perl, Python | ||
or Ruby. | ||
|
||
[discrete] | ||
[[search-data]] | ||
==== Searching your data | ||
|
||
The {es} REST APIs support structured queries, full text queries, and complex | ||
queries that combine the two. Structured queries are | ||
similar to the types of queries you can construct in SQL. For example, you | ||
could search the `gender` and `age` fields in your `employee` index and sort the | ||
matches by the `hire_date` field. Full-text queries find all documents that | ||
match the query string and return them sorted by _relevance_—how good a | ||
match they are for your search terms. | ||
|
||
In addition to searching for individual terms, you can perform phrase searches, | ||
similarity searches, and prefix searches, and get autocomplete suggestions. | ||
|
||
Have geospatial or other numerical data that you want to search? {es} indexes | ||
non-textual data in optimized data structures that support | ||
high-performance geo and numerical queries. | ||
|
||
You can access all of these search capabilities using {es}'s | ||
comprehensive JSON-style query language (<<query-dsl, Query DSL>>). You can also | ||
construct <<sql-overview, SQL-style queries>> to search and aggregate data | ||
natively inside {es}, and JDBC and ODBC drivers enable a broad range of | ||
third-party applications to interact with {es} via SQL. | ||
|
||
[discrete] | ||
[[analyze-data]] | ||
==== Analyzing your data | ||
|
||
{es} aggregations enable you to build complex summaries of your data and gain | ||
insight into key metrics, patterns, and trends. Instead of just finding the | ||
proverbial “needle in a haystack”, aggregations enable you to answer questions | ||
like: | ||
|
||
* How many needles are in the haystack? | ||
* What is the average length of the needles? | ||
* What is the median length of the needles, broken down by manufacturer? | ||
* How many needles were added to the haystack in each of the last six months? | ||
|
||
You can also use aggregations to answer more subtle questions, such as: | ||
|
||
* What are your most popular needle manufacturers? | ||
* Are there any unusual or anomalous clumps of needles? | ||
|
||
Because aggregations leverage the same data-structures used for search, they are | ||
also very fast. This enables you to analyze and visualize your data in real time. | ||
Your reports and dashboards update as your data changes so you can take action | ||
based on the latest information. | ||
|
||
What’s more, aggregations operate alongside search requests. You can search | ||
documents, filter results, and perform analytics at the same time, on the same | ||
data, in a single request. And because aggregations are calculated in the | ||
context of a particular search, you’re not just displaying a count of all | ||
size 70 needles, you’re displaying a count of the size 70 needles | ||
that match your users' search criteria--for example, all size 70 _non-stick | ||
embroidery_ needles. | ||
|
||
[discrete] | ||
[[more-features]] | ||
===== But wait, there’s more | ||
|
||
Want to automate the analysis of your time series data? You can use | ||
{ml-docs}/ml-ad-overview.html[machine learning] features to create accurate | ||
baselines of normal behavior in your data and identify anomalous patterns. With | ||
machine learning, you can detect: | ||
|
||
* Anomalies related to temporal deviations in values, counts, or frequencies | ||
* Statistical rarity | ||
* Unusual behaviors for a member of a population | ||
|
||
And the best part? You can do this without having to specify algorithms, models, | ||
or other data science-related configurations. | ||
|
||
[[scalability]] | ||
=== Scalability and resilience: clusters, nodes, and shards | ||
++++ | ||
|
@@ -255,13 +201,9 @@ create secondary clusters to serve read requests in geo-proximity to your users. | |
the active leader index and handles all write requests. Indices replicated to | ||
secondary clusters are read-only followers. | ||
|
||
[discrete] | ||
[[admin]] | ||
==== Care and feeding | ||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
As with any enterprise system, you need tools to secure, manage, and | ||
monitor your {es} clusters. Security, monitoring, and administrative features | ||
that are integrated into {es} enable you to use {kibana-ref}/introduction.html[{kib}] | ||
as a control center for managing a cluster. Features like <<downsampling, | ||
downsampling>> and <<index-lifecycle-management, index lifecycle management>> | ||
help you intelligently manage your data over time. | ||
help you intelligently manage your data over time. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ | |
|
||
[[near-real-time]] | ||
=== Near real-time search | ||
The overview of <<documents-indices,documents and indices>> indicates that when a document is stored in {es}, it is indexed and fully searchable in _near real-time_--within 1 second. What defines near real-time search? | ||
When a document is stored in {es}, it is indexed and fully searchable in _near real-time_--within 1 second. What defines near real-time search? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm now realizing we often have a hyphen in the noun form (there's one here, So it'd be fair to undo that other change I suggested, and just standardize on the hyphenated form for simplicity and readability. |
||
|
||
Lucene, the Java libraries on which {es} is based, introduced the concept of per-segment search. A _segment_ is similar to an inverted index, but the word _index_ in Lucene means "a collection of segments plus a commit point". After a commit, a new segment is added to the commit point and the buffer is cleared. | ||
|
||
|
Uh oh!
There was an error while loading. Please reload this page.