update knn docs

jmazanec15 · jmazanec15 · commit a5cf609f9bd9 · 2021-02-08T13:31:13.000-08:00
diff --git a/docs/knn/api.md b/docs/knn/api.md
@@ -2,7 +2,7 @@
 layout: default
 title: API
 nav_order: 4
-parent: KNN
+parent: k-NN
 has_children: false
 ---
 
@@ -38,7 +38,7 @@ Statistic |  Description
 `script_query_requests` | The total number of script queries. This is only relevant to k-NN score script search.
 `script_query_errors` | The number of errors during script queries. This is only relevant to k-NN score script search.
 
-### Examples
+### Usage
 ```
 
 GET /_opendistro/_knn/stats?pretty
diff --git a/docs/knn/approximate-knn.md b/docs/knn/approximate-knn.md
@@ -2,7 +2,7 @@
 layout: default
 title: Approximate Search
 nav_order: 1
-parent: KNN
+parent: k-NN
 has_children: false
 has_math: true
 ---
@@ -11,13 +11,13 @@ has_math: true
 
 The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the HNSW algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach should be preferred.
 
-This plugin builds an HNSW graph of the vectors for each "knn-vector field"/"Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. These graphs are loaded into native memory during search and managed by a cache. To pre-load the graphs into memory, please refer to the [warmup API](../api#Warmup). In order to see what graphs are loaded in memory as well as other stats, please refer to the [stats API](../api#Stats). To learn more about segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters will be applied on the results produced by the approximate nearest neighbor search.
+This plugin builds an HNSW graph of the vectors for each "knn-vector field"/"Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. These graphs are loaded into native memory during search and managed by a cache. To pre-load the graphs into memory, please refer to the [warmup API](api#Warmup). In order to see what graphs are loaded in memory as well as other stats, please refer to the [stats API](api#Stats). To learn more about segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters will be applied on the results produced by the approximate nearest neighbor search.
 
 ## Get started with approximate k-NN
 
 To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with the index setting, `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index.
 
-Additionally, if you are using the approximate k-nearest neighbor method, you should specify `knn.space_type` to the space that you are interested in. This setting cannot be changed after it is set. Please refer to the [spaces section](#spaces) to see what spaces we support! By default, `index.knn.space_type` is `l2`. For more information on index settings, such as algorithm parameters that can be tweaked to tune performance, please refer to the [documentation](../settings#IndexSettings).
+Additionally, if you are using the approximate k-nearest neighbor method, you should specify `knn.space_type` to the space that you are interested in. This setting cannot be changed after it is set. Please refer to the [spaces section](#spaces) to see what spaces we support! By default, `index.knn.space_type` is `l2`. For more information on index settings, such as algorithm parameters that can be tweaked to tune performance, please refer to the [documentation](settings#IndexSettings).
 
 Next, you must add one or more fields of the `knn_vector` data type. Here is an example that creates an index with two `knn_vector` fields and uses cosine similarity:
 
diff --git a/docs/knn/index.md b/docs/knn/index.md
@@ -1,6 +1,6 @@
 ---
 layout: default
-title: KNN
+title: k-NN
 nav_order: 50
 has_children: true
 has_toc: false
@@ -20,23 +20,23 @@ This plugin supports three different methods for obtaining the k-nearest neighbo
 
     Approximate k-NN is the best choice for searches over large indices (i.e. hundreds of thousands of vectors or more) that require low latency. Approximate k-NN should not be used if a filter will be applied on the index before the k-NN search, greatly reducing the number of vectors to be searched. In this case, either the script scoring method or the painless extensions should be used.
 
-    For more details refer to the [Approximate k-NN section](../approximate-knn).
+    For more details refer to the [Approximate k-NN section](approximate-knn).
 
 2. **Script Score k-NN**
 
     The second method extends Elasticsearch's script scoring functionality to execute a brute force, exact k-NN search over "knn_vector" fields or fields that can represent binary objects. With this approach, users are able to run k-NN search on a subset of vectors in their index (sometimes referred to as a pre-filter search).
 
     This approach should be used for searches over smaller bodies of documents or when a pre-filter is needed. Using this approach on large indices may lead to high latencies.
 
-    For more details refer to the [k-NN Script Score section](../knn-score-script).
+    For more details refer to the [k-NN Script Score section](knn-score-script).
 
 3. **Painless extensions**
 
     The third method adds the distance functions as painless extensions that can be used in more complex combinations. Similar to the k-NN Script Score, this method can be used to perform a brute force, exact k-NN search across an index and supports pre-filtering.
 
     This approach has slightly slower query performance compared to Script Score k-NN. This approach should be preferred over Script Score k-NN if the use case requires more customization over the final score.
 
-    For more details refer to the [painless functions sectior](../painless-functions).
+    For more details refer to the [painless functions section](painless-functions).
 
 
 Overall, for larger data sets, users should generally choose the approximate nearest neighbor method, because it scales significantly better. For smaller data sets, where a user may want to apply a filter, they should choose the custom scoring approach. If users have a more complex use case where they need to use a distance function as part of their scoring method, they should use the painless scripting approach. 
diff --git a/docs/knn/jni-library.md b/docs/knn/jni-library.md
@@ -2,7 +2,7 @@
 layout: default
 title: JNI Library
 nav_order: 5
-parent: KNN
+parent: k-NN
 has_children: false
 ---
 
diff --git a/docs/knn/knn-score-script.md b/docs/knn/knn-score-script.md
@@ -2,7 +2,7 @@
 layout: default
 title: Exact k-NN with Scoring Script
 nav_order: 2
-parent: KNN
+parent: k-NN
 has_children: false
 has_math: true
 ---
@@ -101,7 +101,7 @@ All parameters are required.
 *Note* -- After ODFE 1.11, `vector` was replaced by `query_value` due to the addition of the `bithamming` space.
 
 
-The [post filter example in the approximate approach](../approximate-knn#UsingApproximatek-NNWithFilters) shows a search that returns fewer than `k` results. If you want to avoid this situation, the score script method lets you essentially invert the order of events. In other words, you can filter down the set of documents you want to execute the k-nearest neighbor search over.
+The [post filter example in the approximate approach](../approximate-knn/#using-approximate-k-nn-with-filters) shows a search that returns fewer than `k` results. If you want to avoid this situation, the score script method lets you essentially invert the order of events. In other words, you can filter down the set of documents you want to execute the k-nearest neighbor search over.
 
 This example shows a pre-filter approach to k-NN search with the score script approach. First, create the index:
 
diff --git a/docs/knn/painless-functions.md b/docs/knn/painless-functions.md
@@ -2,7 +2,7 @@
 layout: default
 title: k-NN Painless Extensions
 nav_order: 3
-parent: KNN
+parent: k-NN
 has_children: false
 has_math: true
 ---
@@ -13,7 +13,7 @@ With the k-NN Plugin's Painless Scripting extensions, you can use k-NN distance
 
 ## Get started with k-NN's Painless Scripting Functions
 
-To use k-NN's Painless Scripting functions, first, you still need to create an index with `knn_vector` fields as was done in [k-NN score script](../knn-score-script#Getting_started_with_the_score_script). Once the index is created and you have ingested some data, you can use the painless extensions like so:
+To use k-NN's Painless Scripting functions, first, you still need to create an index with `knn_vector` fields as was done in [k-NN score script](../knn-score-script#Getting-started-with-the-score-script). Once the index is created and you have ingested some data, you can use the painless extensions like so:
 
 ```
 GET my-knn-index-2/_search
@@ -57,19 +57,19 @@ The following table contains the available painless functions the k-NN plugin pr
   </thead>
   <tr>
     <td>l2Squared</td>
-    <td>`float l2Squared (float[] queryVector, doc['vector field'])`</td>
+    <td><code>float l2Squared (float[] queryVector, doc['vector field'])</code></td>
     <td>This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors.</td>
   </tr>
   <tr>
     <td>cosineSimilarity</td>
-    <td>float cosineSimilarity (float[] queryVector, doc['vector field'])</td>
-    <td>Cosine similarity is inner product of the query vector and document vector normalized to both have length 1. If magnitude of the query vector does not change throughout the query, users can pass magnitude of query vector optionally to improve the performance instead of calculating the magnitude every time for every filtered document: `float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)`. In general, range of cosine similarity is [-1, 1], but in case of information retrieval, the cosine similarity of two documents will range from 0 to 1, since tf-idf cannot be negative. Hence, we add 1.0 to the cosine similarity to score always positive. </td>
+    <td><code>float cosineSimilarity (float[] queryVector, doc['vector field'])</code></td>
+    <td>Cosine similarity is inner product of the query vector and document vector normalized to both have length 1. If magnitude of the query vector does not change throughout the query, users can pass magnitude of query vector optionally to improve the performance instead of calculating the magnitude every time for every filtered document: <code>float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)</code>. In general, range of cosine similarity is [-1, 1], but in case of information retrieval, the cosine similarity of two documents will range from 0 to 1, since tf-idf cannot be negative. Hence, we add 1.0 to the cosine similarity to score always positive. </td>
   </tr>
 </table>
 
 
 ## Constraints
-1. If a document’s knn_vector field has different dimensions than the query, the function throws an IllegalArgumentException.
+1. If a document’s `knn_vector` field has different dimensions than the query, the function throws an `IllegalArgumentException`.
 2. If a vector field doesn't have a value, the function throws an IllegalStateException.
    You can avoid this situation by first checking if a document has a value for the field:
 ```
diff --git a/docs/knn/performance-tuning.md b/docs/knn/performance-tuning.md
@@ -1,7 +1,7 @@
 ---
 layout: default
 title: Performance Tuning
-parent: KNN
+parent: k-NN
 nav_order: 7
 ---
 
@@ -35,7 +35,7 @@ Having replicas set to 0, will avoid duplicate construction of graphs in both pr
 
 3. Increase number of indexing threads
 
-If the hardware we choose has multiple cores, we could allow multiple threads in graph construction and there by speed up the indexing process. You could determine the number of threads to be alloted by using the [knnalgo_paramindex_thread_qty]() setting.
+If the hardware we choose has multiple cores, we could allow multiple threads in graph construction and there by speed up the indexing process. You could determine the number of threads to be alloted by using the [knn.algo_param.index_thread_qty](../settings/#Cluster-settings) setting.
 
 Please keep an eye on CPU utilization and choose right number of threads. Since graph construction is costly, having multiple threads can put additional load on CPU. 
 
@@ -94,4 +94,4 @@ As an example, assume that we have 1 Million vectors with dimension of 256 and M
 
 The standard KNN query and custom scoring option perform differently. Test using a representative set of documents to see if the search results and latencies match your expectations.
 
-Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latencies, but be sure to keep shard size within [the recommended guidelines](../elasticsearch/#primary-and-replica-shards).
+Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latencies, but be sure to keep shard size within [the recommended guidelines](../../elasticsearch/#primary-and-replica-shards).
diff --git a/docs/knn/settings.md b/docs/knn/settings.md
@@ -1,7 +1,7 @@
 ---
 layout: default
 title: Settings
-parent: KNN
+parent: k-NN
 nav_order: 6
 ---