Skip to content

Commit bdf6864

Browse files
authored
[DOCS] Update documents and indices overview (#112394) (#112429)
1 parent ed1ecce commit bdf6864

File tree

1 file changed

+92
-51
lines changed

1 file changed

+92
-51
lines changed

docs/reference/intro.asciidoc

Lines changed: 92 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -55,66 +55,107 @@ You can deploy {es} in various ways:
5555
[[elasticsearch-next-steps]]
5656
=== Learn more
5757

58-
Here are some resources to help you get started:
58+
Some resources to help you get started:
5959

6060
* <<getting-started, Quickstart>>. A beginner's guide to deploying your first {es} instance, indexing data, and running queries.
6161
* https://elastic.co/webinars/getting-started-elasticsearch[Webinar: Introduction to {es}]. Register for our live webinars to learn directly from {es} experts.
6262
* https://www.elastic.co/search-labs[Elastic Search Labs]. Tutorials and blogs that explore AI-powered search using the latest {es} features.
6363
** Follow our tutorial https://www.elastic.co/search-labs/tutorials/search-tutorial/welcome[to build a hybrid search solution in Python].
6464
** Check out the https://github.com/elastic/elasticsearch-labs?tab=readme-ov-file#elasticsearch-examples--apps[`elasticsearch-labs` repository] for a range of Python notebooks and apps for various use cases.
6565

66+
// new html page
6667
[[documents-indices]]
67-
=== Documents and indices
68-
69-
{es} is a distributed document store. Instead of storing information as rows of
70-
columnar data, {es} stores complex data structures that have been serialized
71-
as JSON documents. When you have multiple {es} nodes in a cluster, stored
72-
documents are distributed across the cluster and can be accessed immediately
73-
from any node.
74-
75-
When a document is stored, it is indexed and fully searchable in <<near-real-time,near real-time>>--within 1 second. {es} uses a data structure called an
76-
inverted index that supports very fast full-text searches. An inverted index
77-
lists every unique word that appears in any document and identifies all of the
78-
documents each word occurs in.
79-
80-
An index can be thought of as an optimized collection of documents and each
81-
document is a collection of fields, which are the key-value pairs that contain
82-
your data. By default, {es} indexes all data in every field and each indexed
83-
field has a dedicated, optimized data structure. For example, text fields are
84-
stored in inverted indices, and numeric and geo fields are stored in BKD trees.
85-
The ability to use the per-field data structures to assemble and return search
86-
results is what makes {es} so fast.
87-
88-
{es} also has the ability to be schema-less, which means that documents can be
89-
indexed without explicitly specifying how to handle each of the different fields
90-
that might occur in a document. When dynamic mapping is enabled, {es}
91-
automatically detects and adds new fields to the index. This default
92-
behavior makes it easy to index and explore your data--just start
93-
indexing documents and {es} will detect and map booleans, floating point and
94-
integer values, dates, and strings to the appropriate {es} data types.
95-
96-
You can define rules to control dynamic mapping and explicitly
97-
define mappings to take full control of how fields are stored and indexed.
98-
99-
Defining your own mappings enables you to:
100-
101-
* Distinguish between full-text string fields and exact value string fields
102-
* Perform language-specific text analysis
103-
* Optimize fields for partial matching
104-
* Use custom date formats
105-
* Use data types such as `geo_point` and `geo_shape` that cannot be automatically
106-
detected
107-
108-
It’s often useful to index the same field in different ways for different
109-
purposes. For example, you might want to index a string field as both a text
110-
field for full-text search and as a keyword field for sorting or aggregating
111-
your data. Or, you might choose to use more than one language analyzer to
112-
process the contents of a string field that contains user input.
113-
114-
The analysis chain that is applied to a full-text field during indexing is also
115-
used at search time. When you query a full-text field, the query text undergoes
116-
the same analysis before the terms are looked up in the index.
68+
=== Indices, documents, and fields
69+
++++
70+
<titleabbrev>Indices and documents</titleabbrev>
71+
++++
11772

73+
The index is the fundamental unit of storage in {es}, a logical namespace for storing data that share similar characteristics.
74+
After you have {es} <<elasticsearch-intro-deploy,deployed>>, you'll get started by creating an index to store your data.
75+
76+
[TIP]
77+
====
78+
A closely related concept is a <<data-streams,data stream>>.
79+
This index abstraction is optimized for append-only time-series data, and is made up of hidden, auto-generated backing indices.
80+
If you're working with time-series data, we recommend the {observability-guide}[Elastic Observability] solution.
81+
====
82+
83+
Some key facts about indices:
84+
85+
* An index is a collection of documents
86+
* An index has a unique name
87+
* An index can also be referred to by an alias
88+
* An index has a mapping that defines the schema of its documents
89+
90+
[discrete]
91+
[[elasticsearch-intro-documents-fields]]
92+
==== Documents and fields
93+
94+
{es} serializes and stores data in the form of JSON documents.
95+
A document is a set of fields, which are key-value pairs that contain your data.
96+
Each document has a unique ID, which you can create or have {es} auto-generate.
97+
98+
A simple {es} document might look like this:
99+
100+
[source,js]
101+
----
102+
{
103+
"_index": "my-first-elasticsearch-index",
104+
"_id": "DyFpo5EBxE8fzbb95DOa",
105+
"_version": 1,
106+
"_seq_no": 0,
107+
"_primary_term": 1,
108+
"found": true,
109+
"_source": {
110+
"email": "[email protected]",
111+
"first_name": "John",
112+
"last_name": "Smith",
113+
"info": {
114+
"bio": "Eco-warrior and defender of the weak",
115+
"age": 25,
116+
"interests": [
117+
"dolphins",
118+
"whales"
119+
]
120+
},
121+
"join_date": "2024/05/01"
122+
}
123+
}
124+
----
125+
// NOTCONSOLE
126+
127+
[discrete]
128+
[[elasticsearch-intro-documents-fields-data-metadata]]
129+
==== Data and metadata
130+
131+
An indexed document contains data and metadata.
132+
In {es}, metadata fields are prefixed with an underscore.
133+
134+
The most important metadata fields are:
135+
136+
* `_source`. Contains the original JSON document.
137+
* `_index`. The name of the index where the document is stored.
138+
* `_id`. The document's ID. IDs must be unique per index.
139+
140+
[discrete]
141+
[[elasticsearch-intro-documents-fields-mappings]]
142+
==== Mappings and data types
143+
144+
Each index has a <<mapping,mapping>> or schema for how the fields in your documents are indexed.
145+
A mapping defines the <<mapping-types,data type>> for each field, how the field should be indexed,
146+
and how it should be stored.
147+
When adding documents to {es}, you have two options for mappings:
148+
149+
* <<mapping-dynamic, Dynamic mapping>>. Let {es} automatically detect the data types and create the mappings for you. This is great for getting started quickly.
150+
* <<mapping-explicit, Explicit mapping>>. Define the mappings up front by specifying data types for each field. Recommended for production use cases.
151+
152+
[TIP]
153+
====
154+
You can use a combination of dynamic and explicit mapping on the same index.
155+
This is useful when you have a mix of known and unknown fields in your data.
156+
====
157+
158+
// New html page
118159
[[search-analyze]]
119160
=== Search and analyze
120161

0 commit comments

Comments
 (0)