Update vector-search.md

jcodella · web-flow · commit 0c0f343a93b6 · 2024-03-19T16:48:26.000-04:00
diff --git a/articles/cosmos-db/mongodb/vcore/vector-search.md b/articles/cosmos-db/mongodb/vcore/vector-search.md
@@ -55,7 +55,9 @@ You can create (Hierarchical Navigable Small World) indexes on M40 cluster tiers
 
 |Field    |Type     |Description  |
 |---------|---------|---------|
-| `kind` | string | Type of vector index to create. Type of vector index to create. `vector-hnsw` is available on M40 cluster tiers and higher.|
+| `index_name` | string | Unique name of the index. |
+| `path_to_property` | string | Path to the property that contains the vector. This path can be a top-level property or a dot notation path to the property. If a dot notation path is used, then all the nonleaf elements can't be arrays. Vectors must be a `number[]` to be indexed and return in vector search results.|
+| `kind` | string | Type of vector index to create. The options are `vector-ivf` and `vector-hnsw`. Note `vector-ivf` is available on all cluster tiers and `vector-hnsw` is available on M40 cluster tiers and higher. |
 |`m`        |integer    |The max number of connections per layer (`16` by default, minimum value is `2`, maximum value is `100`). Higher m is suitable for datasets with high dimensionality and/or high accuracy requirements.    |
 |`efConstruction` |integer    |the size of the dynamic candidate list for constructing the graph (`64` by default, minimum value is `4`, maximum value is `1000`). Higher `efConstruction` will result in better index quality and higher accuracy, but it will also increase the time required to build the index. `efConstruction` has to be at least `2 * m`    |
 |`similarity`     |string     |Similarity metric to use with the index. Possible options are `COS` (cosine distance), `L2` (Euclidean distance), and `IP` (inner product).    |
@@ -113,9 +115,9 @@ To create a vector index using the IVF (Inverted File) algorithm, use the follow
 | --- | --- | --- |
 | `index_name` | string | Unique name of the index. |
 | `path_to_property` | string | Path to the property that contains the vector. This path can be a top-level property or a dot notation path to the property. If a dot notation path is used, then all the nonleaf elements can't be arrays. Vectors must be a `number[]` to be indexed and return in vector search results.|
-| `kind` | string | Type of vector index to create. IVF, `vector-ivf`, is shown in this example. |
+| `kind` | string | Type of vector index to create. The options are `vector-ivf` and `vector-hnsw`. Note `vector-ivf` is available on all cluster tiers and `vector-hnsw` is available on M40 cluster tiers and higher.  |
 | `numLists` | integer | This integer is the number of clusters that the inverted file (IVF) index uses to group the vector data. We recommend that `numLists` is set to `documentCount/1000` for up to 1 million documents and to `sqrt(documentCount)` for more than 1 million documents. Using a `numLists` value of `1` is akin to performing brute-force search, which has limited performance. |
-| `similarity` | string | Similarity metric to use with the IVF index. Possible options are `COS` (cosine distance), `L2` (Euclidean distance), and `IP` (inner product). |
+| `similarity` | string | Similarity metric to use with the index. Possible options are `COS` (cosine distance), `L2` (Euclidean distance), and `IP` (inner product). |
 | `dimensions` | integer | Number of dimensions for vector similarity. The maximum number of supported dimensions is `2000`. |
 
 > [!IMPORTANT]
@@ -153,7 +155,7 @@ To retrieve the similarity score (`searchScore`) along with the documents found
 > [!IMPORTANT]
 > Vectors must be a `number[]` to be indexed. Using another type, such as `double[]`,  prevents the document from being indexed. Non-indexed documents won't be returned in the result of a vector search.
 
-## Examples
+## Example using an HNSW index.
 
 The following examples show you how to index vectors, add documents that have vector properties, perform a vector search, and retrieve the index configuration.
 
@@ -271,6 +273,153 @@ In this example, `vectorIndex` is returned with all the `cosmosSearch` parameter
 ]
 ```
 
+## Example using an IVF Index
+
+The following examples show you how to index vectors, add documents that have vector properties, perform a vector search, and retrieve the index configuration.
+
+### Create a vector index
+
+```javascript
+use test;
+
+db.createCollection("exampleCollection");
+
+db.runCommand({
+  createIndexes: 'exampleCollection',
+  indexes: [
+    {
+      name: 'vectorSearchIndex',
+      key: {
+        "vectorContent": "cosmosSearch"
+      },
+      cosmosSearchOptions: {
+        kind: 'vector-ivf',
+        numLists: 3,
+        similarity: 'COS',
+        dimensions: 3
+      }
+    }
+  ]
+});
+```
+
+This command creates a `vector-ivf` index against the `vectorContent` property in the documents that are stored in the specified collection, `exampleCollection`. The `cosmosSearchOptions` property specifies the parameters for the IVF vector index. If your document has the vector stored in a nested property, you can set this property by using a dot notation path. For example, you might use `text.vectorContent` if `vectorContent` is a subproperty of `text`.
+
+### Add vectors to your database
+
+To add vectors to your database's collection, you first need to create the [embeddings](../../../ai-services/openai/concepts/understand-embeddings.md) by using your own model, [Azure OpenAI Embeddings](../../../cognitive-services/openai/tutorials/embeddings.md), or another API (such as [Hugging Face on Azure](https://azure.microsoft.com/solutions/hugging-face-on-azure/)). In this example, new documents are added through sample embeddings:
+
+```javascript
+db.exampleCollection.insertMany([
+  {name: "Eugenia Lopez", bio: "Eugenia is the CEO of AdvenureWorks.", vectorContent: [0.51, 0.12, 0.23]},
+  {name: "Cameron Baker", bio: "Cameron Baker CFO of AdvenureWorks.", vectorContent: [0.55, 0.89, 0.44]},
+  {name: "Jessie Irwin", bio: "Jessie Irwin is the former CEO of AdventureWorks and now the director of the Our Planet initiative.", vectorContent: [0.13, 0.92, 0.85]},
+  {name: "Rory Nguyen", bio: "Rory Nguyen is the founder of AdventureWorks and the president of the Our Planet initiative.", vectorContent: [0.91, 0.76, 0.83]},
+]);
+```
+
+### Perform a vector search
+
+To perform a vector search, use the `$search` aggregation pipeline stage in a MongoDB query. To use the `cosmosSearch` index, use the new `cosmosSearch` operator.
+
+```json
+{
+  {
+  "$search": {
+    "cosmosSearch": {
+        "vector": <vector_to_search>,
+        "path": "<path_to_property>",
+        "k": <num_results_to_return>,
+      },
+      "returnStoredSource": True }},
+  {
+    "$project": { "<custom_name_for_similarity_score>": {
+           "$meta": "searchScore" },
+            "document" : "$$ROOT"
+        }
+  }
+}
+```
+To retrieve the similarity score (`searchScore`) along with the documents found by the vector search, use the `$project` operator to include `searchScore` and rename it as `<custom_name_for_similarity_score>` in the results. Then the document is also projected as nested object. Note that the similarity score is calculated using the metric defined in the vector index.
+
+### Query vectors and vector distances (aka similarity scores) using $search"
+
+Continuing with the last example, create another vector, `queryVector`. Vector search measures the distance between `queryVector` and the vectors in the `vectorContent` path of your documents. You can set the number of results that the search returns by setting the parameter `k`, which is set to `2` here. You can also set `nProbes`, which is an integer that controls the number of nearby clusters that are inspected in each search. A higher value may improve accuracy, however the search will be slower as a result. This is an optional parameter with a default value of 1 and cannot be larger than the `numLists` value specified in the vector index. 
+
+
+```javascript
+const queryVector = [0.52, 0.28, 0.12];
+db.exampleCollection.aggregate([
+  {
+    $search: {
+      "cosmosSearch": {
+        "vector": queryVector,
+        "path": "vectorContent",
+        "k": 2
+      },
+    "returnStoredSource": true }},
+  {
+    "$project": { "similarityScore": {
+           "$meta": "searchScore" },
+            "document" : "$$ROOT"
+        }
+  }
+]);
+```
+
+In this example, a vector search is performed by using `queryVector` as an input via the Mongo shell. The search result is a list of two items that are most similar to the query vector, sorted by their similarity scores.
+
+```javascript
+[
+  {
+    similarityScore: 0.9465376,
+    document: {
+      _id: ObjectId("645acb54413be5502badff94"),
+      name: 'Eugenia Lopez',
+      bio: 'Eugenia is the CEO of AdvenureWorks.',
+      vectorContent: [ 0.51, 0.12, 0.23 ]
+    }
+  },
+  {
+    similarityScore: 0.9006955,
+    document: {
+      _id: ObjectId("645acb54413be5502badff97"),
+      name: 'Rory Nguyen',
+      bio: 'Rory Nguyen is the founder of AdventureWorks and the president of the Our Planet initiative.',
+      vectorContent: [ 0.91, 0.76, 0.83 ]
+    }
+  }
+]
+```
+
+### Get vector index definitions
+
+To retrieve your vector index definition from the collection, use the `listIndexes` command:
+
+``` javascript
+db.exampleCollection.getIndexes();
+```
+
+In this example, `vectorIndex` is returned with all the `cosmosSearch` parameters that were used to create the index:
+
+```javascript
+[
+  { v: 2, key: { _id: 1 }, name: '_id_', ns: 'test.exampleCollection' },
+  {
+    v: 2,
+    key: { vectorContent: 'cosmosSearch' },
+    name: 'vectorSearchIndex',
+    cosmosSearch: {
+      kind: 'vector-ivf',
+      numLists: 3,
+      similarity: 'COS',
+      dimensions: 3
+    },
+    ns: 'test.exampleCollection'
+  }
+]
+```
+
 ## Filtered vector search (preview)
 You can now execute vector searches with any supported query filter such as `$lt, $lte, $eq, $neq, $gte, $gt, $in, $nin, and $regex`. Enable the "filtering vector search" feature in the "Preview Features" tab of your Azure Subscription. Learn more about preview features [here](../../../azure-resource-manager/management/preview-features.md).
 
@@ -310,7 +459,7 @@ db.exampleCollection.aggregate([
 > [!IMPORTANT]
 > While in preview, filtered vector search may require you to adjust your vector index parameters to achieve higher accuracy. For example, increasing `m`, `efConstruction`, or `efSearch` when using HNSW, or `numLists`, or `nProbes` when using IVF, may lead to better results. You should test your configuration before use to ensure that the results are satisfactory. 
 
-## Use LLM Orchestration tools such
+## Use LLM Orchestration tools
 
 ### Use as a vector database with Semantic Kernel
 Use Semantic Kernel to orchestrate your information retrieval from Azure Cosmos DB for MongoDB vCore and your LLM. Learn more [here](https://github.com/microsoft/semantic-kernel/tree/main/python/semantic_kernel/connectors/memory/azure_cosmosdb).