Skip to content
This repository was archived by the owner on Aug 16, 2022. It is now read-only.

Commit 24e90a2

Browse files
Merge pull request #26 from opendistro/master
merging
2 parents 8a2b628 + ba1720b commit 24e90a2

File tree

8 files changed

+49
-32
lines changed

8 files changed

+49
-32
lines changed

docs/ad/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ In this case, a feature is the field in your index that you to check for anomali
5252

5353
For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.
5454

55-
You can add a maximum of five features for a detector.
55+
A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. We recommend adding fewer features to your detector for a higher accuracy. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `opendistro.anomaly_detection.max_anomaly_features` setting.
5656
{: .note }
5757

5858
1. On the **Model configuration** page, enter the **Feature name**.

docs/im/ism/policies.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Field | Description | Type | Required | Read Only
2626
:--- | :--- |:--- |:--- |
2727
`policy_id` | The name of the policy. | `string` | Yes | Yes
2828
`description` | A human-readable description of the policy. | `string` | Yes | No
29+
`ism_template` | Specify an ISM template pattern that matches the index to apply the policy. | `nested list of objects` | No | No
2930
`last_updated_time` | The time the policy was last updated. | `timestamp` | Yes | Yes
3031
`error_notification` | The destination and message template for error notifications. The destination could be Amazon Chime, Slack, or a webhook URL. | `object` | No | No
3132
`default_state` | The default starting state for each index that uses this policy. | `string` | Yes | No

docs/knn/api.md

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,38 +7,43 @@ has_children: false
77
---
88

99
# API
10+
1011
The k-NN plugin adds two API operations in order to allow users to better manage the plugin's functionality.
1112

13+
1214
## Stats
15+
1316
The k-NN `stats` API provides information about the current status of the k-NN Plugin. The plugin keeps track of both cluster level and node level stats. Cluster level stats have a single value for the entire cluster. Node level stats have a single value for each node in the cluster. You can filter their query by nodeID and statName in the following way:
1417
```
1518
GET /_opendistro/_knn/nodeId1,nodeId2/stats/statName1,statName2
1619
```
1720

1821
Statistic | Description
1922
:--- | :---
20-
`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This is only relevant to approximate k-NN search.
21-
`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This is only relevant to approximate k-NN search.
22-
`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. *note:* explicit evictions that occur because of index deletion are not counted. This is only relevant to approximate k-NN search.
23-
`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This is only relevant to approximate k-NN search.
24-
`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This is only relevant to approximate k-NN search.
25-
`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This is only relevant to approximate k-NN search.
23+
`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This statistic is only relevant to approximate k-NN search.
24+
`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This statistic is only relevant to approximate k-NN search.
25+
`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. Note: Explicit evictions that occur because of index deletion are not counted. This statistic is only relevant to approximate k-NN search.
26+
`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This statistic is only relevant to approximate k-NN search.
27+
`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This statistic is only relevant to approximate k-NN search.
28+
`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This statistic is only relevant to approximate k-NN search.
2629
`graph_memory_usage_percentage` | The current weight of the cache as a percentage of the maximum cache capacity.
2730
`graph_index_requests` | The number of requests to add the knn_vector field of a document into a graph.
2831
`graph_index_errors` | The number of requests to add the knn_vector field of a document into a graph that have produced an error.
2932
`graph_query_requests` | The number of graph queries that have been made.
3033
`graph_query_errors` | The number of graph queries that have produced an error.
3134
`knn_query_requests` | The number of KNN query requests received.
32-
`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This is only relevant to approximate k-NN search.
33-
`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This is only relevant to approximate k-NN search.
34-
`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This is only relevant to approximate k-NN search.
35+
`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This statistic is only relevant to approximate k-NN search.
36+
`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This statistic is only relevant to approximate k-NN search.
37+
`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This statistic is only relevant to approximate k-NN search.
3538
`indices_in_cache` | For each index that has graphs in the cache, this stat provides the number of graphs that index has and the total graph_memory_usage that index is using in Kilobytes.
36-
`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This is only relevant to k-NN score script search.
37-
`script_compilation_errors` | The number of errors during script compilation. This is only relevant to k-NN score script search.
38-
`script_query_requests` | The total number of script queries. This is only relevant to k-NN score script search.
39-
`script_query_errors` | The number of errors during script queries. This is only relevant to k-NN score script search.
39+
`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This statistic is only relevant to k-NN score script search.
40+
`script_compilation_errors` | The number of errors during script compilation. This statistic is only relevant to k-NN score script search.
41+
`script_query_requests` | The total number of script queries. This statistic is only relevant to k-NN score script search.
42+
`script_query_errors` | The number of errors during script queries. This statistic is only relevant to k-NN score script search.
43+
4044

4145
### Usage
46+
4247
```json
4348
GET /_opendistro/_knn/stats?pretty
4449
{
@@ -99,7 +104,9 @@ GET /_opendistro/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,gra
99104
}
100105
```
101106

107+
102108
## Warmup operation
109+
103110
The Hierarchical Navigable Small World (HNSW) graphs that are used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, these files need to be loaded into native memory.
104111

105112
If the plugin has not loaded the graphs into native memory, it loads them when it receives a search request. This loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort.
@@ -108,7 +115,9 @@ As an alternative, you can avoid this latency issue by running the k-NN plugin w
108115

109116
After the process finishes, you can start searching against the indices with no initial latency penalties. The warmup API operation is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs that aren't currently in memory.
110117

118+
111119
### Usage
120+
112121
This request performs a warmup on three indices:
113122

114123
```json
@@ -132,7 +141,9 @@ GET /_tasks
132141

133142
After the operation has finished, use the [k-NN `_stats` API operation](#Stats) to see what the k-NN plugin loaded into the graph.
134143

144+
135145
### Best practices
146+
136147
For the warmup operation to function properly, follow these best practices.
137148

138149
First, don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present.

docs/knn/performance-tuning.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ Recall depends on multiple factors like number of vectors, number of dimensions,
8484
Recall can be configured by adjusting the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm params that control recall are m, ef_construction, ef_search. For more details on influence of algorithm parameters on the indexing and search recall, please refer to the [HNSW algorithm parameters document](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values could help recall (leading to better search results) but at the cost of higher memory utilization and increased indexing time. Our default values work on a broader set of use cases from our experiments, but we encourage users to run their own experiments on their data sets and choose the appropriate values. For index-level settings, please refer to the [settings page](../settings#index-settings). We will add details on our experiments here shortly.
8585

8686
## Estimating Memory Usage
87+
8788
Typically, in an Elasticsearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates graphs to a portion of the remaining RAM. This portion's size is determined by the circuit_breaker_limit cluster setting. By default, the circuit breaker limit is set at 50%.
8889

8990
The memory required for graphs is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector.

docs/security/access-control/api.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -402,12 +402,15 @@ DELETE _opendistro/_security/api/internalusers/<username>
402402

403403
Creates or replaces the specified user. You must specify either `password` (plain text) or `hash` (the hashed user password). If you specify `password`, the security plugin automatically hashes the password before storing it.
404404

405+
Note that any role you supply in the `opendistro_security_roles` array must already exist for the security plugin to map the user to that role. To see predefined roles, refer to [the list of predefined roles](../users-roles/#predefined-roles). For instructions on how to create a role, refer to [creating a role](./#create-role).
406+
405407
#### Request
406408

407409
```json
408410
PUT _opendistro/_security/api/internalusers/<username>
409411
{
410412
"password": "kirkpass",
413+
"opendistro_security_roles": ["maintenance_staff", "weapons"],
411414
"backend_roles": ["captains", "starfleet"],
412415
"attributes": {
413416
"attribute1": "value1",
@@ -428,7 +431,7 @@ PUT _opendistro/_security/api/internalusers/<username>
428431

429432
### Patch user
430433

431-
Updates individual attributes of an internal user.
434+
Updates individual attributes of an internal user.
432435

433436
#### Request
434437

@@ -438,6 +441,9 @@ PATCH _opendistro/_security/api/internalusers/<username>
438441
{
439442
"op": "replace", "path": "/backend_roles", "value": ["klingons"]
440443
},
444+
{
445+
"op": "replace", "path": "/opendistro_security_roles", "value": ["ship_manager"]
446+
},
441447
{
442448
"op": "replace", "path": "/attributes", "value": { "newattribute": "newvalue" }
443449
}

docs/security/access-control/cross-cluster-search.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -117,13 +117,13 @@ networks:
117117
After the clusters start, verify the names of each:
118118

119119
```json
120-
curl -XGET -u 'admin:admin' -k https://localhost:9200
120+
curl -XGET -u 'admin:admin' -k 'https://localhost:9200'
121121
{
122122
"cluster_name" : "odfe-cluster1",
123123
...
124124
}
125125
126-
curl -XGET -u 'admin:admin' -k https://localhost:9250
126+
curl -XGET -u 'admin:admin' -k 'https://localhost:9250'
127127
{
128128
"cluster_name" : "odfe-cluster2",
129129
...
@@ -151,7 +151,7 @@ docker inspect --format='{% raw %}{{range .NetworkSettings.Networks}}{{.IPAddres
151151
On the coordinating cluster, add the remote cluster name and the IP address (with port 9300) for each "seed node." In this case, you only have one seed node:
152152

153153
```json
154-
curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' https://localhost:9250/_cluster/settings -d '
154+
curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9250/_cluster/settings' -d '
155155
{
156156
"persistent": {
157157
"search.remote": {
@@ -166,13 +166,13 @@ curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' https://local
166166
On the remote cluster, index a document:
167167

168168
```bash
169-
curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' https://localhost:9200/books/_doc/1 -d '{"Dracula": "Bram Stoker"}'
169+
curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/books/_doc/1' -d '{"Dracula": "Bram Stoker"}'
170170
```
171171

172172
At this point, cross-cluster search works. You can test it using the `admin` user:
173173

174174
```bash
175-
curl -XGET -k -u 'admin:admin' https://localhost:9250/odfe-cluster1:books/_search?pretty
175+
curl -XGET -k -u 'admin:admin' 'https://localhost:9250/odfe-cluster1:books/_search?pretty'
176176
{
177177
...
178178
"hits": [{
@@ -190,14 +190,14 @@ curl -XGET -k -u 'admin:admin' https://localhost:9250/odfe-cluster1:books/_searc
190190
To continue testing, create a new user on both clusters:
191191

192192
```bash
193-
curl -XPUT -k -u 'admin:admin' https://localhost:9200/_opendistro/_security/api/internalusers/booksuser -H 'Content-Type: application/json' -d '{"password":"password"}'
194-
curl -XPUT -k -u 'admin:admin' https://localhost:9250/_opendistro/_security/api/internalusers/booksuser -H 'Content-Type: application/json' -d '{"password":"password"}'
193+
curl -XPUT -k -u 'admin:admin' 'https://localhost:9200/_opendistro/_security/api/internalusers/booksuser' -H 'Content-Type: application/json' -d '{"password":"password"}'
194+
curl -XPUT -k -u 'admin:admin' 'https://localhost:9250/_opendistro/_security/api/internalusers/booksuser' -H 'Content-Type: application/json' -d '{"password":"password"}'
195195
```
196196

197197
Then run the same search as before with `booksuser`:
198198

199199
```json
200-
curl -XGET -k -u booksuser:password https://localhost:9250/odfe-cluster1:books/_search?pretty
200+
curl -XGET -k -u booksuser:password 'https://localhost:9250/odfe-cluster1:books/_search?pretty'
201201
{
202202
"error" : {
203203
"root_cause" : [
@@ -216,8 +216,8 @@ curl -XGET -k -u booksuser:password https://localhost:9250/odfe-cluster1:books/_
216216
Note the permissions error. On the remote cluster, create a role with the appropriate permissions, and map `booksuser` to that role:
217217

218218
```bash
219-
curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' https://localhost:9200/_opendistro/_security/api/roles/booksrole -d '{"index_permissions":[{"index_patterns":["books"],"allowed_actions":["indices:admin/shards/search_shards","indices:data/read/search"]}]}'
220-
curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' https://localhost:9200/_opendistro/_security/api/rolesmapping/booksrole -d '{"users" : ["booksuser"]}'
219+
curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' 'https://localhost:9200/_opendistro/_security/api/roles/booksrole' -d '{"index_permissions":[{"index_patterns":["books"],"allowed_actions":["indices:admin/shards/search_shards","indices:data/read/search"]}]}'
220+
curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' 'https://localhost:9200/_opendistro/_security/api/rolesmapping/booksrole' -d '{"users" : ["booksuser"]}'
221221
```
222222

223223
Both clusters must have the user, but only the remote cluster needs the role and mapping; in this case, the coordinating cluster handles authentication (i.e. "Does this request include valid user credentials?"), and the remote cluster handles authorization (i.e. "Can this user access this data?").
@@ -226,7 +226,7 @@ Both clusters must have the user, but only the remote cluster needs the role and
226226
Finally, repeat the search:
227227

228228
```bash
229-
curl -XGET -k -u booksuser:password https://localhost:9250/odfe-cluster1:books/_search?pretty
229+
curl -XGET -k -u booksuser:password 'https://localhost:9250/odfe-cluster1:books/_search?pretty'
230230
{
231231
...
232232
"hits": [{

docs/security/access-control/users-roles.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Unless you need to create new [read-only or hidden users](../api/#read-only-and-
2626

2727
## Create users
2828

29-
You can create users using Kibana, `internal_users.yml`, or the REST API.
29+
You can create users using Kibana, `internal_users.yml`, or the REST API. When creating a user, you can map users to roles using `internal_users.yml` or the REST API, but that feature is not currently available in Kibana.
3030

3131
### Kibana
3232

@@ -38,7 +38,6 @@ You can create users using Kibana, `internal_users.yml`, or the REST API.
3838

3939
1. Choose **Submit**.
4040

41-
4241
### internal_users.yml
4342

4443
See [YAML files](../../configuration/yaml/#internal_usersyml).
@@ -77,11 +76,10 @@ See [Create role](../api/#create-role).
7776

7877
## Map users to roles
7978

80-
After creating roles, you map users (or backend roles) to them. Intuitively, people often think of this process as giving a user one or more roles, but in the security plugin, the process is reversed; you select a role and then map one or more users to it.
79+
If you didn't specify roles when you created your user, you can map roles to it afterwards.
8180

8281
Just like users and roles, you create role mappings using Kibana, `roles_mapping.yml`, or the REST API.
8382

84-
8583
### Kibana
8684

8785
1. Choose **Security**, **Roles**, and a role.

docs/security/configuration/yaml.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@ new-user:
3434
reserved: false
3535
hidden: false
3636
opendistro_security_roles:
37-
- "some-security-role"
37+
- "specify-some-security-role-here"
3838
backend_roles:
39-
- "some-backend-role"
39+
- "specify-some-backend-role-here"
4040
attributes:
4141
attribute1: "value1"
4242
static: false

0 commit comments

Comments
 (0)