Skip to content

Commit e15f40d

Browse files
committed
Draft 2
1 parent 634f9b1 commit e15f40d

File tree

2 files changed

+92
-73
lines changed

2 files changed

+92
-73
lines changed

get-started/elasticsearch.md

Lines changed: 92 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,77 @@
11
---
2+
description: An introduction to Elasticsearch.
3+
applies_to:
4+
serverless: all
5+
stack: all
26
products:
37
- id: elasticsearch
4-
applies_to:
5-
stack:
68
---
79

810
# {{es}} overview [elasticsearch-overview]
911

10-
{{es}} is a distributed datastore that ingests, indexes, and manages various types of data in near real-time, making them both searchable and analyzable. Built on Apache Lucene, {{es}} scales horizontally across multiple nodes to handle large data volumes while maintaining fast query performance.
12+
{{es}} is a distributed data store that ingests, indexes, and manages diverse data types in near real time, making your data searchable and analyzable. Built on Apache Lucene, {{es}} scales horizontally across multiple nodes to handle large data volumes while maintaining fast query performance.
13+
14+
{{es}} enables you to make large amounts of data quickly searchable. Whether you’re building an e-commerce product search, implementing semantic search with AI, or analyzing log data, {{es}} provides a powerful foundation with efficient indexing and query capabilities.
1115

12-
At its core, {{es}} solves the problem of making large amounts of data quickly searchable. Whether you're building a product search for an e-commerce site, implementing semantic search with AI, or analyzing log data, {{es}} provides the foundation for these use cases through its powerful indexing and query capabilities.
16+
## Key concepts [elasticsearch-key-concepts]
1317

14-
## Distributed architecture [elasticsearch-distributed-architecture]
18+
### Distributed architecture [elasticsearch-distributed-architecture]
1519

16-
{{es}} distributes data across multiple nodes in a cluster. Each node holds a portion of the data in shards, which are self-contained indexes that can be stored on any node.
20+
{{es}} distributes data across multiple nodes within a cluster. Each node stores a portion of the data in shards, which are self-contained indexes that can reside on any node in the cluster.
1721

18-
This distribution enables:
22+
The distributed {{es}} architecture enables the following:
1923

20-
* Horizontal scaling: Add more nodes to increase capacity
21-
* High availability: Data is replicated across nodes to prevent loss
22-
* Parallel processing: Queries execute across shards simultaneously
24+
* Horizontal scaling - Add more nodes to increase capacity
25+
* High availability - Maintained through data replication across nodes to prevent data loss
26+
* Parallel processing - Queries execute across shards simultaneously to deliver fast performance
2327

24-
## Near real-time indexing [elasticsearch-near-real-time-indexing]
28+
### Near real-time indexing [elasticsearch-near-real-time-indexing]
29+
30+
When you send documents to {{es}}, they become searchable within about one second. The near real-time capability makes {{es}} ideal for applications that require immediate data availability.
31+
32+
For example:
2533

26-
When you send documents to Elasticsearch, they become searchable within about one second. This near real-time capability makes Elasticsearch suitable for applications that require immediate data availability, such as:
34+
* Live dashboards display currently collected system metrics
35+
* Product catalogs that instantly update as inventory changes
36+
* User-generated content that appears in search results the moment the content is created
2737

28-
* Live dashboards showing current system metrics
29-
* Product catalogs that update as inventory changes
30-
* User-generated content that appears in search results immediately
38+
### Schema-on-write with dynamic mapping [elasticsearch-schema-on-write-with-dynamic-mapping]
3139

32-
## Schema-on-write with dynamic mapping [elasticsearch-schema-on-write-with-dynamic-mapping]
40+
When you index documents, {{es}} automatically detects the field types. For example, when a document includes a `price` field with a `29.99` value, {{es}} infers that the value is a floating-point number. You can also define explicit mappings to control exactly how data is stored and indexed.
3341

34-
Elasticsearch automatically detects field types when you index documents. If you send a document with a price field containing 29.99, Elasticsearch infers it's a floating-point number. You can also define explicit mappings to control exactly how data is stored and indexed.
42+
Mappings play a key role in the following:
3543

36-
Mappings are important for:
44+
* Storage and query performance optimization
45+
* Specific search feature enablement, such as autocomplete or geospatial search
46+
* Data consistency across documents
3747

38-
* Optimizing storage and query performance
39-
* Enabling specific search features (like autocomplete or geo-search)
40-
* Ensuring data consistency across documents
48+
### Vector capabilities [elasticsearch-vector-capabilities]
4149

42-
## Vector capabilities [elasticsearch-vector-capabilities]
50+
{{es}} functions as a vector database for AI and {{ml}} applications, storing dense vector embeddings alongside traditional text and numeric data.
4351

44-
Elasticsearch serves as a vector database for AI and machine learning applications. It stores dense vector embeddings alongside traditional text and numeric data, enabling:
52+
Vector capabilities enable the following:
4553

46-
* Semantic search: Find content by meaning rather than exact keywords
47-
* Hybrid search: Combine keyword and vector search for best results
48-
* RAG systems: Provide relevant context to large language models
54+
* Semantic search - Find content based on meaning rather than exact keywords
55+
* Hybrid search - Combine keyword and vector-based search results for greater accuracy
56+
* Retrieval-augmented generation (RAG) systems - Provide relevant context to large language models
4957

50-
## How Elasticsearch works [how-elasticsearch-works]
58+
## How {{es}} works [how-elasticsearch-works]
5159

52-
### Data flow [elasticsearch-data-flow]
60+
To enable fast and scalable search, {{es}} ingests, analyzes, and indexes data so queries execute across shards and return results in milliseconds.
61+
62+
![How Elasticsearch works](/get-started/images/how-elasticsearch-works.png)
63+
64+
| ----- | ----- |
65+
| 1 | Ingestion - Data enters {{es}} through the REST API, client libraries, or integrations. |
66+
| 2 | Analysis - Text is processed through analyzers, such as tokenization, stemming, and more. |
67+
| 3 | Indexing - Documents are stored in shards with inverted indexes for fast retrieval. |
68+
| 4 | Querying - Search requests are distributed to relevant shards and results are merged. |
69+
| 5 | Response - Results are returned, typically in milliseconds. |
5370

54-
1. Ingestion: Data enters Elasticsearch through the REST API, client libraries, or integrations
55-
2. Analysis: Text is processed through analyzers (tokenization, stemming, etc.)
56-
3. Indexing: Documents are stored in shards with inverted indexes for fast retrieval
57-
4. Querying: Search requests are distributed to relevant shards and results are merged
58-
5. Response: Results are returned, typically in milliseconds
5971

6072
### Storage model [elasticsearch-storage-model]
6173

62-
Elasticsearch stores data in indices, which are collections of documents with similar characteristics. Each document is a JSON object with fields.
74+
{{es}} stores data in indices, which are collections of documents with similar characteristics. Each document is a JSON object with fields.
6375

6476
For example:
6577

@@ -74,89 +86,96 @@ For example:
7486
}
7587
```
7688

77-
Under the hood, Elasticsearch creates inverted indexes that map each unique term to the documents containing it, enabling fast full-text search.
89+
To enable fast full-text search, {{es}} creates inverted indexes that map each unique term to the documents that containin the term.
7890

7991
### Query execution [elasticsearch-query-execution]
8092

81-
When you search, Elasticsearch:
93+
To search your data, {{es}} uses distributed query execution.
94+
95+
When you search, {{es}}:
8296

8397
1. Parses your query (e.g., "wireless headphones under $100")
84-
2. Determines which shards might contain matching documents
98+
2. Determines the shards that contain the matching documents
8599
3. Executes the query on each relevant shard in parallel
86100
4. Scores results by relevance
87101
5. Merges and sorts results from all shards
88102
6. Returns the top results
89-
7. This distributed query execution is why Elasticsearch can search petabytes of data in milliseconds.
90103

91104
## Use cases [elasticsearch-use-cases]
92105

93-
Elasticsearch excels in scenarios requiring fast search and analysis across large datasets.
106+
{{es}} is ideal for uses cases that require fast search and analysis across large datasets.
94107

95108
### Full-text and hybrid search [elasticsearch-full-text-hybrid-search]
96109

97-
* E-commerce product catalogs: Fast product discovery with filters, facets, and autocomplete
98-
* Enterprise knowledge bases: Search across documents, wikis, and databases with permission controls
99-
* Content platforms: Search articles, videos, and user-generated content by relevance
110+
* E-commerce product catalogs - Fast product discovery with filters, facets, and autocomplete
111+
* Enterprise knowledge bases - Search across documents, wikis, and databases with permission controls
112+
* Content platforms - Search articles, videos, and user-generated content by relevance
100113

101114
### AI-powered applications [elasticsearch-ai-powered-applications]
102115

103-
* Semantic search: Find documents by meaning using vector embeddings from models like BERT or OpenAI
104-
* Chatbots and RAG systems: Retrieve relevant context from knowledge bases to enhance LLM responses
105-
* Recommendation engines: Surface similar items based on vector similarity
116+
* Semantic search - Find documents by meaning using vector embeddings from models like BERT or OpenAI
117+
* Chatbots and RAG systems - Retrieve relevant context from knowledge bases to enhance LLM responses
118+
* Recommendation engines - Surface similar items based on vector similarity
106119

107120
### Geospatial search [elasticsearch-geospatial-search]
108121

109-
* Location-based services: Find nearby restaurants, stores, or services
110-
* Delivery routing: Optimize routes based on geographic data
111-
* Geofencing: Trigger actions when users enter specific areas
122+
* Location-based services - Find nearby restaurants, stores, or services
123+
* Delivery routing - Optimize routes based on geographic data
124+
* Geofencing - Trigger actions when users enter specific areas
112125

113126
### Analytics and monitoring [elasticsearch-analytics-monitoring]
114127

115-
* Log analytics: Centralize and analyze application and system logs
116-
* Security analytics: Detect threats and anomalies in security events
117-
* Business metrics: Analyze user behavior, sales trends, and KPIs
128+
* Log analytics - Centralize and analyze application and system logs
129+
* Security analytics - Detect threats and anomalies in security events
130+
* Business metrics - Analyze user behavior, sales trends, and KPIs
118131

119-
## When to use Elasticsearch [when-to-use-elasticsearch]
132+
## When to use {{es}} [when-to-use-elasticsearch]
120133

121-
Use Elasticsearch when you need:
134+
Use {{es}} when you need:
122135

123136
* Fast search across large volumes of text, numeric, or vector data
124137
* Complex queries with filters, aggregations, and relevance scoring
125-
* Near real-time data availability (seconds, not minutes)
126-
* Scalability to handle growing data volumes
127-
* Flexibility to handle various data types and evolving schemas
138+
* Near real-time access to data
139+
* Scalability to handle growing datasets
140+
* Flexibility to manage diverse data types and evolving schemas
141+
142+
Consider alternatives to {{es}} when:
143+
144+
* You require transactional guarantees and complex joins across multiple entities, which are better handled by relational databases
145+
* Strong consistency is more important than eventual consistency
146+
* Your datasets are small, such as under 1GB, where simpler solutions suffice
128147

129148
## Architecture considerations [elasticsearch-architecture-considerations]
130149

131150
### Deployment options [elasticsearch-deployment-options]
132151

133-
* Elasticsearch Serverless: Fully managed, auto-scaling deployment (recommended for new projects)
134-
* Elastic Cloud: Managed Elasticsearch with more configuration control
135-
* Self-managed: Install and operate Elasticsearch yourself (requires expertise)
152+
* **[{{serverless-full}}](/deploy-manage/deploy/elastic-cloud/serverless.md)** - Fully managed, auto-scaling deployment, which is recommended for new projects
153+
* **[{{ech}}](/deploy-manage/deploy/elastic-cloud/cloud-hosted.md)** - Managed {{es}} with more configuration control
154+
* **[Self-managed](/deploy-manage/deploy/self-managed.md)** - Install and operate {{es}} yourself, which requires expertise
136155

137156
### Cluster sizing [elasticsearch-cluster-sizing]
138157

139-
* Small deployments: 3-5 nodes for development and small production use cases
140-
* Medium deployments: 10-20 nodes for moderate data volumes and query loads
141-
* Large deployments: 50+ nodes for high-volume production systems
158+
* Small deployments - 3-5 nodes for development and small production use cases
159+
* Medium deployments - 10-20 nodes for moderate data volumes and query loads
160+
* Large deployments - 50 or more nodes for high-volume production systems
142161

143162
### Data modeling best practices [elasticsearch-data-modeling-best-practices]
144163

145-
* One document type per index: Keep related data together
146-
* Denormalize data: Include related information in documents to avoid "joins"
147-
* Use appropriate field types: Match data types to query patterns
148-
* Plan for growth: Consider time-based indices for logs and events
164+
* One document type per index - Keep related data together
165+
* Denormalize data - Include related information in documents to avoid joins
166+
* Use appropriate field types - Match data types to query patterns
167+
* Plan for growth - Consider time-based indices for logs and events
149168

150169
## Next steps [elasticsearch-next-steps]
151170

152-
Ready to try Elasticsearch? Here's how to get started:
171+
Ready to try {{es}}? Here's how to get started:
153172

154-
* Get started with Elasticsearch - Run your first queries in 5 minutes
155-
* Tutorial: Build a search application - Create a full-featured search experience
156-
* Understanding Elasticsearch architecture - Deep dive into distributed systems concepts
173+
* [Get started](/solutions/search/get-started.md) - Run your first queries in 5 minutes
174+
% how* Tutorial: Build a search application - Create a full-featured search experience
175+
* [Understanding {{es}} architecture](/deploy-manage/distributed-architecture.md) - Deep dive into distributed systems concepts
157176

158177
For specific use cases:
159178

160-
* Implementing semantic search - Add AI-powered search
161-
* Building geospatial applications - Work with location data
162-
* Analyzing logs and metrics - Set up observability
179+
* [Implementing semantic search](/solutions/search/get-started/semantic-search.md) - Add AI-powered search
180+
* [Building geospatial applications](/explore-analyze/geospatial-analysis.md) - Work with location data
181+
* [Analyzing logs and metrics](/solutions/observability/get-started.md) - Set up observability
846 KB
Loading

0 commit comments

Comments
 (0)