Skip to content
This repository was archived by the owner on Aug 16, 2022. It is now read-only.

Commit 444e806

Browse files
Merge pull request #27 from opendistro/master
merge
2 parents 24e90a2 + e3dc3b0 commit 444e806

File tree

8 files changed

+102
-43
lines changed

8 files changed

+102
-43
lines changed

docs/ad/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ In this case, a feature is the field in your index that you to check for anomali
5252

5353
For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.
5454

55-
A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. We recommend adding fewer features to your detector for a higher accuracy. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `opendistro.anomaly_detection.max_anomaly_features` setting.
55+
A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. We recommend experimenting with a historical detector with different feature sets and checking the precision before moving on to real-time detectors. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `opendistro.anomaly_detection.max_anomaly_features` setting.
5656
{: .note }
5757

5858
1. On the **Model configuration** page, enter the **Feature name**.

docs/cli/index.md

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ has_children: false
99

1010
The Open Distro for Elasticsearch command line interface (odfe-cli) lets you manage your ODFE cluster from the command line and automate tasks.
1111

12-
Currently, odfe-cli only supports the [Anomaly Detection](../ad/) plugin. You can create and delete detectors, start and stop them, and use profiles to easily access different clusters or sign requests with different credentials.
12+
Currently, odfe-cli supports the [Anomaly Detection](../ad/) and [k-NN](../knn/) plugins, along with arbitrary REST API paths. Among other things, you can use odfe-cli create and delete detectors, start and stop them, and check k-NN statistics.
13+
14+
Profiles let you easily access different clusters or sign requests with different credentials. odfe-cli supports unauthenticated requests, HTTP basic signing, and IAM signing for Amazon Web Services.
1315

1416
This example moves a detector (`ecommerce-count-quantity`) from a staging cluster to a production cluster:
1517

@@ -47,28 +49,25 @@ odfe-cli ad delete ecommerce-count-quantity --profile staging
4749

4850
## Profiles
4951

50-
Profiles let you easily switch between different clusters and user credentials. To get started, run `odfe-cli profile create` and specify a unique profile name:
52+
Profiles let you easily switch between different clusters and user credentials. To get started, run `odfe-cli profile create` with the `--auth-type`, `--endpoint`, and `--name` options:
5153

52-
```
53-
$ odfe-cli profile create
54-
Enter profile's name: default
55-
Elasticsearch Endpoint: https://localhost:9200
56-
User Name: <username>
57-
Password: <password>
54+
```bash
55+
odfe-cli profile create --auth-type basic --endpoint https://localhost:9200 --name docker-local
5856
```
5957

6058
Alternatively, save a configuration file to `~/.odfe-cli/config.yaml`:
6159

6260
```yaml
6361
profiles:
64-
- endpoint: https://localhost:9200
65-
username: admin
66-
password: foobar
67-
name: default
68-
- endpoint: https://odfe-node1:9200
69-
username: admin
70-
password: foobar
71-
name: dev
62+
- name: docker-local
63+
endpoint: https://localhost:9200
64+
user: admin
65+
password: foobar
66+
- name: aws
67+
endpoint: https://some-cluster.us-east-1.es.amazonaws.com
68+
aws_iam:
69+
profile: ""
70+
service: es
7271
```
7372
7473
@@ -83,7 +82,13 @@ odfe-cli <command> <subcommand> <flags>
8382
For example, the following command retrieves information about a detector:
8483

8584
```bash
86-
odfe-cli ad get my-detector --profile dev
85+
odfe-cli ad get my-detector --profile docker-local
86+
```
87+
88+
For a request to the Elasticsearch CAT API, try the following command:
89+
90+
```bash
91+
odfe-cli curl get --path _cat/plugins --profile aws
8792
```
8893

8994
Use the `-h` or `--help` flag to see all supported commands, subcommands, or usage for a specific command:

docs/security/access-control/multi-tenancy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ After creating a tenant, give a role access to it using Kibana, the REST API, or
114114

115115
#### REST API
116116

117-
See [Create role](../API/#create-role).
117+
See [Create role](../api/#create-role).
118118

119119

120120
#### roles.yml

docs/security/access-control/permissions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ Rather than creating new action groups from individual permissions, you can ofte
117117
- indices:admin/open
118118
- indices:admin/refresh
119119
- indices:admin/refresh*
120+
- indices:admin/resolve/index
120121
- indices:admin/rollover
121122
- indices:admin/seq_no/global_checkpoint_sync
122123
- indices:admin/settings/update

docs/security/configuration/client-auth.md

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ You can now assign your certificate's common name (CN) to a role. For this step,
4545

4646
After deciding which role you want to map your certificate's CN to, you can use [Kibana](../../access-control/users-roles#map-users-to-roles), [`roles_mapping.yml`](../yaml/#roles_mappingyml), or the [REST API](../../access-control/api/#create-role-mapping) to map your certificate's CN to the role. The following example uses the `REST API` to map the common name `CLIENT1` to the role `readall`.
4747

48-
#### Sample request
48+
**Sample request**
4949

5050
```json
5151
PUT _opendistro/_security/api/rolesmapping/readall
@@ -56,7 +56,7 @@ PUT _opendistro/_security/api/rolesmapping/readall
5656
}
5757
```
5858

59-
#### Sample response
59+
**Sample response**
6060

6161
```json
6262
{
@@ -78,10 +78,30 @@ headers = {
7878
}
7979
cert_file_path = "/full/path/to/client-cert.pem"
8080
key_file_path = "/full/path/to/client-cert-key.pem"
81+
root_ca_path = "/full/path/to/root-ca.pem"
8182
8283
# Send the request.
8384
path = 'movies/_doc/3'
8485
url = base_url + path
85-
response = requests.get(url, cert = (cert_file_path, key_file_path), verify=False)
86+
response = requests.get(url, cert = (cert_file_path, key_file_path), verify=root_ca_path)
8687
print(response.text)
8788
```
89+
90+
## Configuring Beats
91+
92+
You can also configure your Beats so that it uses a client certificate for authentication with Elasticsearch. Afterwards, it can start sending output to Elasticsearch.
93+
94+
This output configuration specifies which settings you need for client certificate authentication:
95+
96+
```yml
97+
output.elasticsearch:
98+
enabled: true
99+
# Array of hosts to connect to.
100+
hosts: ["localhost:9200"]
101+
# Protocol - either `http` (default) or `https`.
102+
protocol: "https"
103+
ssl.certificate_authorities: ["/full/path/to/CA.pem"]
104+
ssl.verification_mode: certificate
105+
ssl.certificate: "/full/path/to/client-cert.pem"
106+
ssl.key: "/full/path/to/to/client-cert-key.pem"
107+
```

docs/trace/data-prepper-reference.md

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,17 @@ nav_order: 25
77

88
# Data Prepper configuration reference
99

10-
This page lists all supported Data Prepper sources, buffers, processors, and sinks, along with their associated options. For example configuration files, see [Data Prepper](../data-prepper/).
10+
This page lists all supported Data Prepper sources, buffers, preppers, and sinks, along with their associated options. For example configuration files, see [Data Prepper](../data-prepper/).
11+
12+
13+
## Data Prepper server options
14+
Option | Required | Description
15+
:--- | :--- | :---
16+
ssl | No | Boolean, indicating whether TLS should be used for server APIs. Defaults to true.
17+
keyStoreFilePath | No | String, path to a .jks or .p12 keystore file. Required if ssl is true.
18+
keyStorePassword | No | String, password for keystore. Optional, defaults to empty string.
19+
privateKeyPassword | No | String, password for private key within keystore. Optional, defaults to empty string.
20+
serverPort | No | Integer, port number to use for server APIs. Defaults to 4900
1121

1222

1323
## General pipeline options
@@ -72,12 +82,12 @@ buffer_size | No | Integer, default 512. The maximum number of records the buffe
7282
batch_size | No | Integer, default 8. The maximum number of records the buffer drains after each read.
7383

7484

75-
## Processors
85+
## Preppers
7686

77-
Processors perform some action on your data: filter, transform, enrich, etc.
87+
Preppers perform some action on your data: filter, transform, enrich, etc.
7888

7989

80-
### otel_trace_raw_processor
90+
### otel_trace_raw_prepper
8191

8292
Converts OpenTelemetry data to Elasticsearch-compatible JSON documents. No options.
8393

@@ -86,10 +96,22 @@ Converts OpenTelemetry data to Elasticsearch-compatible JSON documents. No optio
8696

8797
Uses OpenTelemetry data to create a distributed service map for visualization in Kibana. No options.
8898

99+
### peer_forwarder
100+
Forwards ExportTraceServiceRequests via gRPC to other Data Prepper instances. Required for operating Data Prepper in a clustered deployment.
101+
102+
Option | Required | Description
103+
:--- | :--- | :---
104+
time_out | No | Integer, forwarded request timeout in seconds. Defaults to 3 seconds.
105+
span_agg_count | No | Integer, batch size for number of spans per request. Defaults to 48.
106+
discovery_mode | No | String, peer discovery mode to be used. Allowable values are `static` and `dns`. Defaults to `static`.
107+
static_endpoints | No | List, containing string endpoints of all Data Prepper instances.
108+
domain_name | No | String, single domain name to query DNS against. Typically used by creating multiple DNS A Records for the same domain.
109+
ssl | No | Boolean, indicating whether TLS should be used. Default is true.
110+
sslKeyCertChainFile | No | String, path to the security certificate
89111

90112
### string_converter
91113

92-
Converts strings to uppercase or lowercase. Mostly useful as an example if you want to develop your own processor.
114+
Converts strings to uppercase or lowercase. Mostly useful as an example if you want to develop your own prepper.
93115

94116
Option | Required | Description
95117
:--- | :--- | :---
@@ -116,7 +138,7 @@ aws_region | No | String, AWS region for the cluster (e.g. `"us-east-1"`) if you
116138
trace_analytics_raw | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-span-*` index pattern (alias `otel-v1-apm-span`) for use with the Trace Analytics Kibana plugin.
117139
trace_analytics_service_map | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-service-map` index for use with the service map component of the Trace Analytics Kibana plugin.
118140
index | No | String, name of the index to export to. Only required if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets.
119-
template_file | No | String, the path to a JSON [index template](https://opendistro.github.io/for-elasticsearch-docs/docs/elasticsearch/index-templates/) file (e.g. `/your/local/template-file.json` if you do not use the `trace_analytics_raw` or `trace_analytics_service_map`. See [otel-v1-apm-span-index-template.json](https://github.com/opendistro-for-elasticsearch/simple-ingest-transformation-utility-pipeline/blob/master/situp-plugins/elasticsearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example.
141+
template_file | No | String, the path to a JSON [index template](https://opendistro.github.io/for-elasticsearch-docs/docs/elasticsearch/index-templates/) file (e.g. `/your/local/template-file.json` if you do not use the `trace_analytics_raw` or `trace_analytics_service_map`. See [otel-v1-apm-span-index-template.json](https://github.com/opendistro-for-elasticsearch/data-prepper/blob/main/data-prepper-plugins/elasticsearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example.
120142
document_id_field | No | String, the field from the source data to use for the Elasticsearch document ID (e.g. `"my-field"`) if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets.
121143
dlq_file | No | String, the path to your preferred dead letter queue file (e.g. `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the Elasticsearch cluster.
122144
bulk_size | No | Integer (long), default 5. The maximum size (in MiB) of bulk requests to the Elasticsearch cluster. Values below 0 indicate an unlimited size. If a single document exceeds the maximum bulk request size, Data Prepper sends it individually.

docs/trace/data-prepper.md

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Otherwise, [download](https://opendistro.github.io/for-elasticsearch/downloads.h
2323

2424
## Configure pipelines
2525

26-
To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more processors, and one or more sinks:
26+
To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks:
2727

2828
```yml
2929
sample-pipeline:
@@ -38,8 +38,8 @@ sample-pipeline:
3838
bounded_blocking:
3939
buffer_size: 1024 # max number of records the buffer accepts
4040
batch_size: 256 # max number of records the buffer drains after each read
41-
processor:
42-
- otel_trace_raw_processor:
41+
prepper:
42+
- otel_trace_raw_prepper:
4343
sink:
4444
- elasticsearch:
4545
hosts: ["https:localhost:9200"]
@@ -55,9 +55,9 @@ sample-pipeline:
5555

5656
By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings.
5757

58-
- Processors perform some action on your data: filter, transform, enrich, etc.
58+
- Preppers perform some action on your data: filter, transform, enrich, etc.
5959

60-
You can have multiple processors, which run sequentially from top to bottom, not in parallel. The `otel_trace_raw_processor` processor converts OpenTelemetry data into Elasticsearch-compatible JSON documents.
60+
You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The `otel_trace_raw_prepper` prepper converts OpenTelemetry data into Elasticsearch-compatible JSON documents.
6161

6262
- Sinks define where your data goes. In this case, the sink is an Open Distro for Elasticsearch cluster.
6363

@@ -68,6 +68,9 @@ entry-pipeline:
6868
delay: "100"
6969
source:
7070
otel_trace_source:
71+
ssl: true
72+
sslKeyCertChainFile: "config/demo-data-prepper.crt"
73+
sslKeyFile: "config/demo-data-prepper.key"
7174
sink:
7275
- pipeline:
7376
name: "raw-pipeline"
@@ -77,10 +80,8 @@ raw-pipeline:
7780
source:
7881
pipeline:
7982
name: "entry-pipeline"
80-
processor:
81-
- string_converter:
82-
upper_case: true
83-
- otel_trace_raw_processor:
83+
prepper:
84+
- otel_trace_raw_prepper:
8485
sink:
8586
- elasticsearch:
8687
hosts: ["https://localhost:9200" ]
@@ -93,7 +94,7 @@ service-map-pipeline:
9394
source:
9495
pipeline:
9596
name: "entry-pipeline"
96-
processor:
97+
prepper:
9798
- service_map_stateful:
9899
sink:
99100
- elasticsearch:
@@ -106,19 +107,29 @@ service-map-pipeline:
106107

107108
To learn more, see the [Data Prepper configuration reference](../data-prepper-reference/).
108109

110+
## Configure the Data Prepper server
111+
Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port which serves these endpoints, as well as TLS configuration, is specified by a separate YAML file. Example:
112+
113+
```yml
114+
ssl: true
115+
keyStoreFilePath: "/usr/share/data-prepper/keystore.jks"
116+
keyStorePassword: "password"
117+
privateKeyPassword: "other_password"
118+
serverPort: 1234
119+
```
109120

110121
## Start Data Prepper
111122

112123
**Docker**
113124

114125
```bash
115-
docker run --name data-prepper --expose 21890 --read-only -v /full/path/to/my-data-prepper-config.yml:/usr/share/data-prepper/data-prepper.yml amazon/opendistro-for-elasticsearch-data-prepper:latest
126+
docker run --name data-prepper --expose 21890 -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v /full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml amazon/opendistro-for-elasticsearch-data-prepper:latest
116127
```
117128

118129
**macOS and Linux**
119130

120131
```bash
121-
./data-prepper-tar-install.sh config/my-data-prepper-config.yml
132+
./data-prepper-tar-install.sh config/pipelines.yaml config/data-prepper-config.yaml
122133
```
123134

124-
For production workloads, you likely want to run Data Prepper on a dedicated machine, which makes connectivity a concern. Data Prepper uses port 21890 and must be able to connect to both the OpenTelemetry Collector and the Elasticsearch cluster. In the [sample applications](https://github.com/opendistro-for-elasticsearch/Data-Prepper/tree/master/examples), you can see that all components use the same Docker network and expose the appropriate ports.
135+
For production workloads, you likely want to run Data Prepper on a dedicated machine, which makes connectivity a concern. Data Prepper uses port 21890 and must be able to connect to both the OpenTelemetry Collector and the Elasticsearch cluster. In the [sample applications](https://github.com/opendistro-for-elasticsearch/Data-Prepper/tree/main/examples), you can see that all components use the same Docker network and expose the appropriate ports.

docs/trace/get-started.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ nav_order: 1
77

88
# Get started with Trace Analytics
99

10-
Open Distro for Elasticsearch Trace Analytics consists of two components---Data Prepper and the Trace Analytics Kibana plugin---that fit into the OpenTelemetry and Elasticsearch ecosystems. The Data Prepper repository has several [sample applications](https://github.com/opendistro-for-elasticsearch/Data-Prepper/tree/master/examples) to help you get started.
10+
Open Distro for Elasticsearch Trace Analytics consists of two components---Data Prepper and the Trace Analytics Kibana plugin---that fit into the OpenTelemetry and Elasticsearch ecosystems. The Data Prepper repository has several [sample applications](https://github.com/opendistro-for-elasticsearch/data-prepper/tree/main/examples) to help you get started.
1111

1212

1313
## Basic flow of data
@@ -29,7 +29,7 @@ Open Distro for Elasticsearch Trace Analytics consists of two components---Data
2929

3030
One Trace Analytics sample application is the Jaeger HotROD demo, which mimics the flow of data through a distributed application.
3131

32-
Download or clone the [Data Prepper repository](https://github.com/opendistro-for-elasticsearch/Data-Prepper/tree/master/examples). Then navigate to `examples/jaeger-hotrod/` and open `docker-compose.yml` in a text editor. This file contains a container for each element from [Basic flow of data](#basic-flow-of-data):
32+
Download or clone the [Data Prepper repository](https://github.com/opendistro-for-elasticsearch/data-prepper). Then navigate to `examples/jaeger-hotrod/` and open `docker-compose.yml` in a text editor. This file contains a container for each element from [Basic flow of data](#basic-flow-of-data):
3333

3434
- A distributed application (`jaeger-hot-rod`) with the Jaeger agent (`jaeger-agent`)
3535
- The [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/getting-started/) (`otel-collector`)

0 commit comments

Comments
 (0)