You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 16, 2022. It is now read-only.
Copy file name to clipboardExpand all lines: docs/ad/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,7 +52,7 @@ In this case, a feature is the field in your index that you to check for anomali
52
52
53
53
For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.
54
54
55
-
You can add a maximum of five features for a detector.
55
+
A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall)of a model. A higher proportion of noise in your data might further amplify this negative impact. We recommend adding fewer features to your detector for a higher accuracy. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `opendistro.anomaly_detection.max_anomaly_features` setting.
56
56
{: .note }
57
57
58
58
1. On the **Model configuration** page, enter the **Feature name**.
Copy file name to clipboardExpand all lines: docs/im/ism/policies.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,7 @@ Field | Description | Type | Required | Read Only
26
26
:--- | :--- |:--- |:--- |
27
27
`policy_id` | The name of the policy. | `string` | Yes | Yes
28
28
`description` | A human-readable description of the policy. | `string` | Yes | No
29
+
`ism_template` | Specify an ISM template pattern that matches the index to apply the policy. | `nested list of objects` | No | No
29
30
`last_updated_time` | The time the policy was last updated. | `timestamp` | Yes | Yes
30
31
`error_notification` | The destination and message template for error notifications. The destination could be Amazon Chime, Slack, or a webhook URL. | `object` | No | No
31
32
`default_state` | The default starting state for each index that uses this policy. | `string` | Yes | No
Copy file name to clipboardExpand all lines: docs/knn/api.md
+24-13Lines changed: 24 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,38 +7,43 @@ has_children: false
7
7
---
8
8
9
9
# API
10
+
10
11
The k-NN plugin adds two API operations in order to allow users to better manage the plugin's functionality.
11
12
13
+
12
14
## Stats
15
+
13
16
The k-NN `stats` API provides information about the current status of the k-NN Plugin. The plugin keeps track of both cluster level and node level stats. Cluster level stats have a single value for the entire cluster. Node level stats have a single value for each node in the cluster. You can filter their query by nodeID and statName in the following way:
14
17
```
15
18
GET /_opendistro/_knn/nodeId1,nodeId2/stats/statName1,statName2
16
19
```
17
20
18
21
Statistic | Description
19
22
:--- | :---
20
-
`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This is only relevant to approximate k-NN search.
21
-
`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This is only relevant to approximate k-NN search.
22
-
`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. *note:* explicit evictions that occur because of index deletion are not counted. This is only relevant to approximate k-NN search.
23
-
`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This is only relevant to approximate k-NN search.
24
-
`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This is only relevant to approximate k-NN search.
25
-
`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This is only relevant to approximate k-NN search.
23
+
`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This statistic is only relevant to approximate k-NN search.
24
+
`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This statistic is only relevant to approximate k-NN search.
25
+
`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. Note: Explicit evictions that occur because of index deletion are not counted. This statistic is only relevant to approximate k-NN search.
26
+
`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This statistic is only relevant to approximate k-NN search.
27
+
`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This statistic is only relevant to approximate k-NN search.
28
+
`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This statistic is only relevant to approximate k-NN search.
26
29
`graph_memory_usage_percentage` | The current weight of the cache as a percentage of the maximum cache capacity.
27
30
`graph_index_requests` | The number of requests to add the knn_vector field of a document into a graph.
28
31
`graph_index_errors` | The number of requests to add the knn_vector field of a document into a graph that have produced an error.
29
32
`graph_query_requests` | The number of graph queries that have been made.
30
33
`graph_query_errors` | The number of graph queries that have produced an error.
31
34
`knn_query_requests` | The number of KNN query requests received.
32
-
`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This is only relevant to approximate k-NN search.
33
-
`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This is only relevant to approximate k-NN search.
34
-
`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This is only relevant to approximate k-NN search.
35
+
`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This statistic is only relevant to approximate k-NN search.
36
+
`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This statistic is only relevant to approximate k-NN search.
37
+
`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This statistic is only relevant to approximate k-NN search.
35
38
`indices_in_cache` | For each index that has graphs in the cache, this stat provides the number of graphs that index has and the total graph_memory_usage that index is using in Kilobytes.
36
-
`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This is only relevant to k-NN score script search.
37
-
`script_compilation_errors` | The number of errors during script compilation. This is only relevant to k-NN score script search.
38
-
`script_query_requests` | The total number of script queries. This is only relevant to k-NN score script search.
39
-
`script_query_errors` | The number of errors during script queries. This is only relevant to k-NN score script search.
39
+
`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This statistic is only relevant to k-NN score script search.
40
+
`script_compilation_errors` | The number of errors during script compilation. This statistic is only relevant to k-NN score script search.
41
+
`script_query_requests` | The total number of script queries. This statistic is only relevant to k-NN score script search.
42
+
`script_query_errors` | The number of errors during script queries. This statistic is only relevant to k-NN score script search.
43
+
40
44
41
45
### Usage
46
+
42
47
```json
43
48
GET /_opendistro/_knn/stats?pretty
44
49
{
@@ -99,7 +104,9 @@ GET /_opendistro/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,gra
99
104
}
100
105
```
101
106
107
+
102
108
## Warmup operation
109
+
103
110
The Hierarchical Navigable Small World (HNSW) graphs that are used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, these files need to be loaded into native memory.
104
111
105
112
If the plugin has not loaded the graphs into native memory, it loads them when it receives a search request. This loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort.
@@ -108,7 +115,9 @@ As an alternative, you can avoid this latency issue by running the k-NN plugin w
108
115
109
116
After the process finishes, you can start searching against the indices with no initial latency penalties. The warmup API operation is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs that aren't currently in memory.
110
117
118
+
111
119
### Usage
120
+
112
121
This request performs a warmup on three indices:
113
122
114
123
```json
@@ -132,7 +141,9 @@ GET /_tasks
132
141
133
142
After the operation has finished, use the [k-NN `_stats` API operation](#Stats) to see what the k-NN plugin loaded into the graph.
134
143
144
+
135
145
### Best practices
146
+
136
147
For the warmup operation to function properly, follow these best practices.
137
148
138
149
First, don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present.
Copy file name to clipboardExpand all lines: docs/knn/performance-tuning.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -84,6 +84,7 @@ Recall depends on multiple factors like number of vectors, number of dimensions,
84
84
Recall can be configured by adjusting the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm params that control recall are m, ef_construction, ef_search. For more details on influence of algorithm parameters on the indexing and search recall, please refer to the [HNSW algorithm parameters document](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values could help recall (leading to better search results) but at the cost of higher memory utilization and increased indexing time. Our default values work on a broader set of use cases from our experiments, but we encourage users to run their own experiments on their data sets and choose the appropriate values. For index-level settings, please refer to the [settings page](../settings#index-settings). We will add details on our experiments here shortly.
85
85
86
86
## Estimating Memory Usage
87
+
87
88
Typically, in an Elasticsearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates graphs to a portion of the remaining RAM. This portion's size is determined by the circuit_breaker_limit cluster setting. By default, the circuit breaker limit is set at 50%.
88
89
89
90
The memory required for graphs is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector.
Creates or replaces the specified user. You must specify either `password` (plain text) or `hash` (the hashed user password). If you specify `password`, the security plugin automatically hashes the password before storing it.
404
404
405
+
Note that any role you supply in the `opendistro_security_roles` array must already exist for the security plugin to map the user to that role. To see predefined roles, refer to [the list of predefined roles](../users-roles/#predefined-roles). For instructions on how to create a role, refer to [creating a role](./#create-role).
406
+
405
407
#### Request
406
408
407
409
```json
408
410
PUT _opendistro/_security/api/internalusers/<username>
@@ -151,7 +151,7 @@ docker inspect --format='{% raw %}{{range .NetworkSettings.Networks}}{{.IPAddres
151
151
On the coordinating cluster, add the remote cluster name and the IP address (with port 9300) for each "seed node." In this case, you only have one seed node:
Both clusters must have the user, but only the remote cluster needs the role and mapping; in this case, the coordinating cluster handles authentication (i.e. "Does this request include valid user credentials?"), and the remote cluster handles authorization (i.e. "Can this user access this data?").
@@ -226,7 +226,7 @@ Both clusters must have the user, but only the remote cluster needs the role and
Copy file name to clipboardExpand all lines: docs/security/access-control/users-roles.md
+2-4Lines changed: 2 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ Unless you need to create new [read-only or hidden users](../api/#read-only-and-
26
26
27
27
## Create users
28
28
29
-
You can create users using Kibana, `internal_users.yml`, or the REST API.
29
+
You can create users using Kibana, `internal_users.yml`, or the REST API. When creating a user, you can map users to roles using `internal_users.yml` or the REST API, but that feature is not currently available in Kibana.
30
30
31
31
### Kibana
32
32
@@ -38,7 +38,6 @@ You can create users using Kibana, `internal_users.yml`, or the REST API.
38
38
39
39
1. Choose **Submit**.
40
40
41
-
42
41
### internal_users.yml
43
42
44
43
See [YAML files](../../configuration/yaml/#internal_usersyml).
@@ -77,11 +76,10 @@ See [Create role](../api/#create-role).
77
76
78
77
## Map users to roles
79
78
80
-
After creating roles, you map users (or backend roles) to them. Intuitively, people often think of this process as giving a user one or more roles, but in the security plugin, the process is reversed; you select a role and then map one or more users to it.
79
+
If you didn't specify roles when you created your user, you can map roles to it afterwards.
81
80
82
81
Just like users and roles, you create role mappings using Kibana, `roles_mapping.yml`, or the REST API.
0 commit comments