Skip to content

Commit 46017ce

Browse files
committed
Checkpoint, many content changes/formatting
1 parent f133473 commit 46017ce

File tree

17 files changed

+342
-129
lines changed

17 files changed

+342
-129
lines changed

docs/10_About_workshop/1_intro.mdx

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,18 @@ To begin, navigate to the [Atlas Search Playground](https://search-playground.mo
1919
In the next section, you'll work through the first exercise to get familiar with the
2020
Playground's Code Sandbox.
2121

22-
Let's dive into the world of Atlas Search using this convenient and powerful playground!
22+
### Run button
23+
24+
After you make changes to any of the Playground areas, press the `Run` button to execute
25+
the aggregation pipeline.
26+
27+
![Run button](/img/playground_run.png)
2328

2429
## Resources
25-
* https://www.mongodb.com/developer/products/atlas/search-playground-intro/
30+
* https://www.mongodb.com/developer/products/atlas/search-playground-intro/
31+
32+
------
33+
34+
Let's dive into the world of Atlas Search using this convenient and powerful playground environment!
2635

2736

28-
* `index` parameter to $search!!!

docs/10_About_workshop/2_lets_go.mdx

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,7 @@ we'll get there next. See if you can solve it with what you already know or expe
1212

1313
The objective with this exercise is to adjust the `FIX_ME` in the `$search` aggregation pipeline stage so that the document (in the Data Source pane) matches the query and appears in the Results pane.
1414

15-
![Playground intro exercise](/img/playground_intro_exercise.png)
16-
17-
![Run button](/img/playground_run.png)
15+
[![Playground intro exercise](/img/playground_intro_exercise.png)](/img/playground_intro_exercise.png)
1816

1917
<details>
2018
<summary>Here's a solution...</summary>

docs/20_Intro_to_Atlas_Search/1_system.mdx

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,26 @@ Atlas Search provides powerful findability capabilities to your data collections
44
A flexible index configuration allows mapping and indexing only the fields needed,
55
or dynamically mapping any and all fields supported.
66

7+
## The Big Picture
8+
9+
Applications communicate to Atlas Search through the same mechanism as
10+
all other requests: an aggregation pipeline through a MongoDB driver.
11+
712
![big picture](/img/big_picture.png)
813

9-
![system diagram](/img/system_diagram.png)
14+
## System architecture
15+
16+
The Atlas Search `mongot` process, built on Apache Lucene, interfaces with the `mongod`
17+
database process to create and manage full-text (and vector search) indexes and queries.
18+
19+
The `mongot` process performs the following tasks:
20+
* Creates Atlas Search indexes based on the index definition.
21+
* Monitors change streams for the current state of the documents.
22+
* Processes Atlas Search queries and returns the document IDs and other search metadata for
23+
the matching documents to `mongod`, which then does a full document lookup and returns the
24+
results to the client.
25+
26+
[![system diagram](/img/system_diagram.png)](/img/system_diagram.png)
1027

1128
Changes to a collection via updates, deletes, or additions are *eventually consistent*, meaning the
1229
index is updated independently of changes to the collection in a separate process, asynchronously.

docs/20_Intro_to_Atlas_Search/2_aggregation_stages.mdx

Lines changed: 40 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# 👐 Aggregation pipeline search stages
22

3+
The search stages must be the first stage in a pipeline, as they do not accept incoming
4+
documents but rather only emit documents. There are two stages available, depending on
5+
the particular needs.
6+
37
## $search
48

59
Returns matching documents.
@@ -10,59 +14,57 @@ Results are returned in descending **score** order or in an optional `sort` orde
1014
## $searchMeta
1115

1216
Returns a single document of search result metadata including count of matching documents
13-
and facets requested. No actual collection documents are returned.
17+
and any facets requested. No actual collection documents are returned.
1418

1519
The `$searchMeta` stage performs the same search that `$search` does,
1620
but only returns the results metadata, not actual matching documents.
1721
Results metadata includes the count of matching results and facets.
1822
This same metadata is available when using `$search` too,
1923
accessible in the $$SEARCH_META context variable.
2024

21-
22-
2325
## Exercises: search pipeline stages
2426

2527
### Step 1
26-
1. Navigate to the original Playground used in the last section's exercise
27-
https://search-playground.mongodb.com/tools/code-sandbox/snapshots/6782aea0667feaaf06324b87
28-
2. Press Run. Got the empty `[]` array of results?
29-
3. Change `$search` to `$searchMeta` (in the Query pane), and press Run again.
30-
31-
<details>
32-
<summary>Here's the expected results...</summary>
33-
<div>
34-
```js
35-
[
36-
{
37-
"count": {
38-
"lowerBound": 0
28+
1. Navigate to the original Playground used in the last section's exercise
29+
https://search-playground.mongodb.com/tools/code-sandbox/snapshots/6782aea0667feaaf06324b87
30+
2. Press Run. Got the empty `[]` array of results?
31+
3. Change `$search` to `$searchMeta` (in the Query pane), and press Run again.
32+
33+
<details>
34+
<summary>Here's the expected results...</summary>
35+
<div>
36+
```js
37+
[
38+
{
39+
"count": {
40+
"lowerBound": 0
41+
}
3942
}
40-
}
41-
]
42-
```
43-
</div>
44-
</details>
43+
]
44+
```
45+
</div>
46+
</details>
4547

4648
### Step 2
4749

48-
1. Now fix the query to match the document as you did previously
49-
2. Press Run again
50-
3. Did the `$searchMeta` results change?
51-
52-
<details>
53-
<summary>Here's the expected results...</summary>
54-
<div>
55-
```js
56-
[
57-
{
58-
"count": {
59-
"lowerBound": 1
50+
1. Now fix the query to match the document as you did previously
51+
2. Press Run again
52+
3. Did the `$searchMeta` results change?
53+
54+
<details>
55+
<summary>Here's the expected results...</summary>
56+
<div>
57+
```js
58+
[
59+
{
60+
"count": {
61+
"lowerBound": 1
62+
}
6063
}
61-
}
62-
]
63-
```
64-
</div>
65-
</details>
64+
]
65+
```
66+
</div>
67+
</details>
6668

6769
## Post $search-stages
6870

Lines changed: 46 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,63 @@
11
# 📘 Powered by Lucene
22

3-
https://lucene.apache.org
3+
[Lucene](https://lucene.apache.org), is a Java library providing powerful indexing and search
4+
features, as well as spellchecking, hit highlighting, and advanced analysis/tokenization
5+
capabilities. Without a doubt, you've already used Lucene perhaps without even realizing it.
6+
It powers the search facilities of countless websites and applications, both public and private.
47

58
## Anatomy of a Lucene index
69

7-
A Lucene index encapsulates specialized data structures unique to each type of data indexed.
10+
A Lucene index encapsulates specialized data structures unique to each type of data indexed.
811

9-
* Numbers and dates: ...
10-
* Geo-spatial: ...
12+
* Numbers, dates, geo-spatial points: Indexed into a k-d structure
13+
* Vectors: Hierarchical Navigable Small Worlds (HNSW) data structure
1114
* Text: via inverted indexes
1215

13-
Each field is indexed independently.
16+
What Lucene, and Atlas Search, call an "index" is really a collection of separate individual
17+
per-field data structures.
1418

15-
Segmented architecture, append-only, for fast indexing. Background processes to optimize the index
16-
segments.
19+
Lucene is designed for both fast searches and speedy indexing. The indexing speed derives from its
20+
append-only segmented architecture. When new docuemnts are indexed, they are added to a new segments.
21+
When that indexing session is complete, the new segments are opened and blended with all the other
22+
active segments. Background processes optimize the index segments by combining them to form larger,
23+
and less, segments over time.
24+
25+
A single Lucene index can handle up to 2 billion documents. There is generally a 1-1 correspondence
26+
between documents in your collection to Lucene documents, with the exception of nested documents
27+
mapped as `embeddedDocuments` (a topic covered later). To differentiate the terminology, Atlas Search
28+
calls the documents in Lucene index "index objects". See
29+
[index size and configuration doc](https://www.mongodb.com/docs/atlas/atlas-search/performance/index-performance/#index-size-and-configuration)
30+
for more details.
1731

1832
## Inverted Index
1933

34+
Textual content is the heart and soul of Lucene. `string` fields are analzyed. The output of the
35+
analysis process is a series of **terms**. Terms are generally a normalized version of the individual
36+
words of the text. These terms are then organized into an **inverted index** data structure.
37+
This data structure is lexicographically (or alphabetically) ordered. The following image illustrates
38+
an inverted index built from 3 documents, each with a single string field.
39+
2040
![inverted index](/img/analysis_lucene_standard.png)
2141

42+
Along with an ordered dictionary of terms, corpus and document-level statistics are also collected into
43+
the inverted index structure. These statistics include:
44+
45+
* term frequency (`tf`): the number of times a term occurs in the field
46+
* document frequency (`df`): how many documents contain the term
47+
* field length: how many terms are there in each field
48+
2249
## Search algorithms
2350

24-
* "index intersection" using skip lists
25-
* link to Adrien's presentation
51+
Lucene queries leverage the data structures built at index-time to quickly find, and rank, matching
52+
documents. The synergy "index intersection" shines when searching across multiple fields in a single
53+
query.
54+
55+
Atlas Search translates its search operators directly to Lucene's `Query` API.
56+
57+
## Resources
2658

27-
Atlas Search translates its search operators to Lucene's `Query` API.
59+
* ["What is in a Lucene index"](https://www.youtube.com/watch?v=T5RmMNDR5XI) - a very
60+
educational presentation delivered by Lucene project committer Adrien Grand.
61+
* ["Visualizing Lucene's segment merges"](https://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html)
62+
- an illustrative set of animations on the how Lucene keeps itself optimized, balancing both
63+
indexing and searching needs.
Lines changed: 74 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,88 @@
11
# 📘 Index configuration
22

3-
Documents are mapped to an index through a flexible configuration.
4-
5-
* https://www.mongodb.com/docs/atlas/atlas-search/define-field-mappings/
3+
Documents are indexed in Lucene using a flexible configuration specification.
64

75
## Supported types
86

9-
## Dynamic mapping
7+
The type of a field determines how it can be indexed, and thus how it can be searched.
8+
9+
Most field types are supported, including all the basic types such as booleans, dates, numerics,
10+
and strings, as well as ObjectID, UUID, GeoJSON. Null values are also supported implicitly.
11+
12+
## Mapping
13+
14+
A JSON-formatted index configuration mapping specifies which fields are indexed.
15+
16+
Fields can be mapped explicitly, by name (and path), or dynamically, or a combination of
17+
both.
18+
19+
### Dynamic mapping
20+
21+
An index configuration defaults a fully dynamic mapping:
22+
23+
```
24+
{
25+
"mappings": {
26+
"dynamic": true
27+
}
28+
}
29+
```
30+
31+
A dynamic mapping indexes all *dynamically supported* fields automatically. Dynamic mapping alleviates
32+
having to specify every field explicitly, which would be arduous in situations where there are many
33+
fields, or where new fields could be added to documents over time.
1034

11-
You can configure an entire index to use dynamic mappings, or specify individual fields,
12-
such as fields of type `document`, to be dynamically mapped.
35+
You can configure an entire index to use dynamic mappings, or have fields only within nested documents,
36+
be dynamically mapped.
1337

14-
## Configuring a real Atlas Search index
38+
Not all data types are supported with dynamic mapping. Most notably, GeoJSON fields require explicit
39+
mapping.
40+
41+
If `mappings.dynamic` is set to `false`, at least one field must be explicitly mapped.
42+
43+
## Static/explicit field mapping
44+
45+
To explicitly specify a fields mapping, it is listed in a `mappings.fields` section of the configuration.
46+
47+
An example:
48+
49+
```
50+
{
51+
"mappings": {
52+
"dynamic": false,
53+
"fields": {
54+
"in_stock": [
55+
{
56+
"type": "boolean"
57+
}
58+
]
59+
}
60+
}
61+
}
62+
```
63+
Using that mapping, this document would only have the `in_stock` field indexed, not the `name` field.
64+
65+
```
66+
{
67+
_id: 1,
68+
name: "Product One",
69+
in_stock: true
70+
}
71+
```
72+
73+
## Configuring an Atlas Search index
74+
75+
Outside of the Playground, you have several options to set up and configure a persistent
76+
Atlas Search index.
1577

1678
* Atlas Search Visual Editor or JSON Editor
1779
* via Compass
1880
* Atlas CLI
1981
* Driver commands
2082

21-
## Configuration options
83+
The Atlas Search Visual Editor is a good place to start, to become familiar with the syntax
84+
and options available.
85+
86+
## Resources
2287

23-
* `storedSource`
24-
* `synonyms`
88+
* https://www.mongodb.com/docs/atlas/atlas-search/define-field-mappings/

docs/30_Index_configuration/2_basic_types.mdx

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,31 @@ mapped type.
2828
## ObjectId
2929
* `equals`: https://search-playground.mongodb.com/tools/code-sandbox/snapshots/678506c2b6487c1cfd0bb540
3030
* `in`:
31+
32+
## Exercises
33+
34+
### boolean equals
35+
36+
Why doesn't this playground match as expected?
37+
* https://search-playground.mongodb.com/tools/code-sandbox/snapshots/6787a56ed8892198a2a8f3f4
38+
39+
<details>
40+
<summary>Explanation</summary>
41+
<div>
42+
Because $search.equals.value is a string of "true", not an actual boolean.
43+
Here's a corrected pipeline:
44+
```js
45+
[
46+
{
47+
$search: {
48+
index: "default",
49+
equals: {
50+
value: true,
51+
path: "in_stock"
52+
}
53+
}
54+
}
55+
]
56+
```
57+
</div>
58+
</details>

0 commit comments

Comments
 (0)