Merge pull request #27 from opendistro/master

ashwinkumar12345 · web-flow · commit 444e806b0614 · 2021-04-04T00:20:21.000-07:00
merge
diff --git a/docs/ad/index.md b/docs/ad/index.md
@@ -52,7 +52,7 @@ In this case, a feature is the field in your index that you to check for anomali
 
 For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.
 
-A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. We recommend adding fewer features to your detector for a higher accuracy. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `opendistro.anomaly_detection.max_anomaly_features` setting.
+A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. We recommend experimenting with a historical detector with different feature sets and checking the precision before moving on to real-time detectors. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `opendistro.anomaly_detection.max_anomaly_features` setting.
 {: .note }
 
 1. On the **Model configuration** page, enter the **Feature name**.
diff --git a/docs/cli/index.md b/docs/cli/index.md
@@ -9,7 +9,9 @@ has_children: false
 
 The Open Distro for Elasticsearch command line interface (odfe-cli) lets you manage your ODFE cluster from the command line and automate tasks.
 
-Currently, odfe-cli only supports the [Anomaly Detection](../ad/) plugin. You can create and delete detectors, start and stop them, and use profiles to easily access different clusters or sign requests with different credentials.
+Currently, odfe-cli supports the [Anomaly Detection](../ad/) and [k-NN](../knn/) plugins, along with arbitrary REST API paths. Among other things, you can use odfe-cli create and delete detectors, start and stop them, and check k-NN statistics.
+
+Profiles let you easily access different clusters or sign requests with different credentials. odfe-cli supports unauthenticated requests, HTTP basic signing, and IAM signing for Amazon Web Services.
 
 This example moves a detector (`ecommerce-count-quantity`) from a staging cluster to a production cluster:
 
@@ -47,28 +49,25 @@ odfe-cli ad delete ecommerce-count-quantity --profile staging
 
 ## Profiles
 
-Profiles let you easily switch between different clusters and user credentials. To get started, run `odfe-cli profile create` and specify a unique profile name:
+Profiles let you easily switch between different clusters and user credentials. To get started, run `odfe-cli profile create` with the `--auth-type`, `--endpoint`, and `--name` options:
 
-```
-$ odfe-cli profile create
-Enter profile's name: default
-Elasticsearch Endpoint: https://localhost:9200
-User Name: <username>
-Password: <password>
+```bash
+odfe-cli profile create --auth-type basic --endpoint https://localhost:9200 --name docker-local
 ```
 
 Alternatively, save a configuration file to `~/.odfe-cli/config.yaml`:
 
 ```yaml
 profiles:
-- endpoint: https://localhost:9200
-  username: admin
-  password: foobar
-  name: default
-- endpoint: https://odfe-node1:9200
-  username: admin
-  password: foobar
-  name: dev
+    - name: docker-local
+      endpoint: https://localhost:9200
+      user: admin
+      password: foobar
+    - name: aws
+      endpoint: https://some-cluster.us-east-1.es.amazonaws.com
+      aws_iam:
+        profile: ""
+        service: es
 ```
 
 
@@ -83,7 +82,13 @@ odfe-cli <command> <subcommand> <flags>
 For example, the following command retrieves information about a detector:
 
 ```bash
-odfe-cli ad get my-detector --profile dev
+odfe-cli ad get my-detector --profile docker-local
+```
+
+For a request to the Elasticsearch CAT API, try the following command:
+
+```bash
+odfe-cli curl get --path _cat/plugins --profile aws
 ```
 
 Use the `-h` or `--help` flag to see all supported commands, subcommands, or usage for a specific command:
diff --git a/docs/security/access-control/multi-tenancy.md b/docs/security/access-control/multi-tenancy.md
@@ -114,7 +114,7 @@ After creating a tenant, give a role access to it using Kibana, the REST API, or
 
 #### REST API
 
-See [Create role](../API/#create-role).
+See [Create role](../api/#create-role).
 
 
 #### roles.yml
diff --git a/docs/security/access-control/permissions.md b/docs/security/access-control/permissions.md
@@ -117,6 +117,7 @@ Rather than creating new action groups from individual permissions, you can ofte
 - indices:admin/open
 - indices:admin/refresh
 - indices:admin/refresh*
+- indices:admin/resolve/index
 - indices:admin/rollover
 - indices:admin/seq_no/global_checkpoint_sync
 - indices:admin/settings/update
diff --git a/docs/security/configuration/client-auth.md b/docs/security/configuration/client-auth.md
@@ -45,7 +45,7 @@ You can now assign your certificate's common name (CN) to a role. For this step,
 
 After deciding which role you want to map your certificate's CN to, you can use [Kibana](../../access-control/users-roles#map-users-to-roles), [`roles_mapping.yml`](../yaml/#roles_mappingyml), or the [REST API](../../access-control/api/#create-role-mapping) to map your certificate's CN to the role. The following example uses the `REST API` to map the common name `CLIENT1` to the role `readall`.
 
-#### Sample request
+**Sample request**
 
 ```json
 PUT _opendistro/_security/api/rolesmapping/readall
@@ -56,7 +56,7 @@ PUT _opendistro/_security/api/rolesmapping/readall
 }
 ```
 
-#### Sample response
+**Sample response**
 
 ```json
 {
@@ -78,10 +78,30 @@ headers = {
 }
 cert_file_path = "/full/path/to/client-cert.pem"
 key_file_path = "/full/path/to/client-cert-key.pem"
+root_ca_path = "/full/path/to/root-ca.pem"
 
 # Send the request.
 path = 'movies/_doc/3'
 url = base_url + path
-response = requests.get(url, cert = (cert_file_path, key_file_path), verify=False)
+response = requests.get(url, cert = (cert_file_path, key_file_path), verify=root_ca_path)
 print(response.text)
 ```
+
+## Configuring Beats
+
+You can also configure your Beats so that it uses a client certificate for authentication with Elasticsearch. Afterwards, it can start sending output to Elasticsearch.
+
+This output configuration specifies which settings you need for client certificate authentication:
+
+```yml
+output.elasticsearch:
+  enabled: true
+  # Array of hosts to connect to.
+  hosts: ["localhost:9200"]
+  # Protocol - either `http` (default) or `https`.
+  protocol: "https"
+  ssl.certificate_authorities: ["/full/path/to/CA.pem"]
+  ssl.verification_mode: certificate
+  ssl.certificate: "/full/path/to/client-cert.pem"
+  ssl.key: "/full/path/to/to/client-cert-key.pem"
+```
diff --git a/docs/trace/data-prepper-reference.md b/docs/trace/data-prepper-reference.md
@@ -7,7 +7,17 @@ nav_order: 25
 
 # Data Prepper configuration reference
 
-This page lists all supported Data Prepper sources, buffers, processors, and sinks, along with their associated options. For example configuration files, see [Data Prepper](../data-prepper/).
+This page lists all supported Data Prepper sources, buffers, preppers, and sinks, along with their associated options. For example configuration files, see [Data Prepper](../data-prepper/).
+
+
+## Data Prepper server options
+Option | Required | Description
+:--- | :--- | :---
+ssl | No | Boolean, indicating whether TLS should be used for server APIs. Defaults to true.
+keyStoreFilePath | No | String, path to a .jks or .p12 keystore file. Required if ssl is true.
+keyStorePassword | No | String, password for keystore. Optional, defaults to empty string.
+privateKeyPassword | No | String, password for private key within keystore. Optional, defaults to empty string.
+serverPort | No | Integer, port number to use for server APIs. Defaults to 4900
 
 
 ## General pipeline options
@@ -72,12 +82,12 @@ buffer_size | No | Integer, default 512. The maximum number of records the buffe
 batch_size | No | Integer, default 8. The maximum number of records the buffer drains after each read.
 
 
-## Processors
+## Preppers
 
-Processors perform some action on your data: filter, transform, enrich, etc.
+Preppers perform some action on your data: filter, transform, enrich, etc.
 
 
-### otel_trace_raw_processor
+### otel_trace_raw_prepper
 
 Converts OpenTelemetry data to Elasticsearch-compatible JSON documents. No options.
 
@@ -86,10 +96,22 @@ Converts OpenTelemetry data to Elasticsearch-compatible JSON documents. No optio
 
 Uses OpenTelemetry data to create a distributed service map for visualization in Kibana. No options.
 
+### peer_forwarder
+Forwards ExportTraceServiceRequests via gRPC to other Data Prepper instances. Required for operating Data Prepper in a clustered deployment.
+
+Option | Required | Description
+:--- | :--- | :---
+time_out | No | Integer, forwarded request timeout in seconds. Defaults to 3 seconds.
+span_agg_count | No | Integer, batch size for number of spans per request. Defaults to 48.
+discovery_mode | No | String, peer discovery mode to be used. Allowable values are `static` and `dns`. Defaults to `static`.
+static_endpoints | No | List, containing string endpoints of all Data Prepper instances.
+domain_name | No | String, single domain name to query DNS against. Typically used by creating multiple DNS A Records for the same domain.
+ssl | No | Boolean, indicating whether TLS should be used. Default is true.
+sslKeyCertChainFile | No | String, path to the security certificate
 
 ### string_converter
 
-Converts strings to uppercase or lowercase. Mostly useful as an example if you want to develop your own processor.
+Converts strings to uppercase or lowercase. Mostly useful as an example if you want to develop your own prepper.
 
 Option | Required | Description
 :--- | :--- | :---
@@ -116,7 +138,7 @@ aws_region | No | String, AWS region for the cluster (e.g. `"us-east-1"`) if you
 trace_analytics_raw | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-span-*` index pattern (alias `otel-v1-apm-span`) for use with the Trace Analytics Kibana plugin.
 trace_analytics_service_map | No | Boolean, default false. Whether to export as trace data to the `otel-v1-apm-service-map` index for use with the service map component of the Trace Analytics Kibana plugin.
 index | No | String, name of the index to export to. Only required if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets.
-template_file | No | String, the path to a JSON [index template](https://opendistro.github.io/for-elasticsearch-docs/docs/elasticsearch/index-templates/) file (e.g. `/your/local/template-file.json` if you do not use the `trace_analytics_raw` or `trace_analytics_service_map`. See [otel-v1-apm-span-index-template.json](https://github.com/opendistro-for-elasticsearch/simple-ingest-transformation-utility-pipeline/blob/master/situp-plugins/elasticsearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example.
+template_file | No | String, the path to a JSON [index template](https://opendistro.github.io/for-elasticsearch-docs/docs/elasticsearch/index-templates/) file (e.g. `/your/local/template-file.json` if you do not use the `trace_analytics_raw` or `trace_analytics_service_map`. See [otel-v1-apm-span-index-template.json](https://github.com/opendistro-for-elasticsearch/data-prepper/blob/main/data-prepper-plugins/elasticsearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example.
 document_id_field | No | String, the field from the source data to use for the Elasticsearch document ID (e.g. `"my-field"`) if you don't use the `trace_analytics_raw` or `trace_analytics_service_map` presets.
 dlq_file | No | String, the path to your preferred dead letter queue file (e.g. `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the Elasticsearch cluster.
 bulk_size | No | Integer (long), default 5. The maximum size (in MiB) of bulk requests to the Elasticsearch cluster. Values below 0 indicate an unlimited size. If a single document exceeds the maximum bulk request size, Data Prepper sends it individually.
diff --git a/docs/trace/data-prepper.md b/docs/trace/data-prepper.md
@@ -23,7 +23,7 @@ Otherwise, [download](https://opendistro.github.io/for-elasticsearch/downloads.h
 
 ## Configure pipelines
 
-To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more processors, and one or more sinks:
+To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks:
 
 ```yml
 sample-pipeline:
@@ -38,8 +38,8 @@ sample-pipeline:
     bounded_blocking:
       buffer_size: 1024 # max number of records the buffer accepts
       batch_size: 256 # max number of records the buffer drains after each read
-  processor:
-    - otel_trace_raw_processor:
+  prepper:
+    - otel_trace_raw_prepper:
   sink:
     - elasticsearch:
         hosts: ["https:localhost:9200"]
@@ -55,9 +55,9 @@ sample-pipeline:
 
   By default, Data Prepper uses its one and only buffer, the `bounded_blocking` buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings.
 
-- Processors perform some action on your data: filter, transform, enrich, etc.
+- Preppers perform some action on your data: filter, transform, enrich, etc.
 
-  You can have multiple processors, which run sequentially from top to bottom, not in parallel. The `otel_trace_raw_processor` processor converts OpenTelemetry data into Elasticsearch-compatible JSON documents.
+  You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The `otel_trace_raw_prepper` prepper converts OpenTelemetry data into Elasticsearch-compatible JSON documents.
 
 - Sinks define where your data goes. In this case, the sink is an Open Distro for Elasticsearch cluster.
 
@@ -68,6 +68,9 @@ entry-pipeline:
   delay: "100"
   source:
     otel_trace_source:
+      ssl: true
+      sslKeyCertChainFile: "config/demo-data-prepper.crt"
+      sslKeyFile: "config/demo-data-prepper.key"
   sink:
     - pipeline:
         name: "raw-pipeline"
@@ -77,10 +80,8 @@ raw-pipeline:
   source:
     pipeline:
       name: "entry-pipeline"
-  processor:
-    - string_converter:
-        upper_case: true
-    - otel_trace_raw_processor:
+  prepper:
+    - otel_trace_raw_prepper:
   sink:
     - elasticsearch:
         hosts: ["https://localhost:9200" ]
@@ -93,7 +94,7 @@ service-map-pipeline:
   source:
     pipeline:
       name: "entry-pipeline"
-  processor:
+  prepper:
     - service_map_stateful:
   sink:
     - elasticsearch:
@@ -106,19 +107,29 @@ service-map-pipeline:
 
 To learn more, see the [Data Prepper configuration reference](../data-prepper-reference/).
 
+## Configure the Data Prepper server
+Data Prepper itself provides administrative HTTP endpoints such as `/list` to list pipelines and `/metrics/prometheus` to provide Prometheus-compatible metrics data. The port which serves these endpoints, as well as TLS configuration, is specified by a separate YAML file. Example:
+
+```yml
+ssl: true
+keyStoreFilePath: "/usr/share/data-prepper/keystore.jks"
+keyStorePassword: "password"
+privateKeyPassword: "other_password"
+serverPort: 1234
+```
 
 ## Start Data Prepper
 
 **Docker**
 
 ```bash
-docker run --name data-prepper --expose 21890 --read-only -v /full/path/to/my-data-prepper-config.yml:/usr/share/data-prepper/data-prepper.yml amazon/opendistro-for-elasticsearch-data-prepper:latest
+docker run --name data-prepper --expose 21890 -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v /full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml amazon/opendistro-for-elasticsearch-data-prepper:latest
 ```
 
 **macOS and Linux**
 
 ```bash
-./data-prepper-tar-install.sh config/my-data-prepper-config.yml
+./data-prepper-tar-install.sh config/pipelines.yaml config/data-prepper-config.yaml
 ```
 
-For production workloads, you likely want to run Data Prepper on a dedicated machine, which makes connectivity a concern. Data Prepper uses port 21890 and must be able to connect to both the OpenTelemetry Collector and the Elasticsearch cluster. In the [sample applications](https://github.com/opendistro-for-elasticsearch/Data-Prepper/tree/master/examples), you can see that all components use the same Docker network and expose the appropriate ports.
+For production workloads, you likely want to run Data Prepper on a dedicated machine, which makes connectivity a concern. Data Prepper uses port 21890 and must be able to connect to both the OpenTelemetry Collector and the Elasticsearch cluster. In the [sample applications](https://github.com/opendistro-for-elasticsearch/Data-Prepper/tree/main/examples), you can see that all components use the same Docker network and expose the appropriate ports.
diff --git a/docs/trace/get-started.md b/docs/trace/get-started.md
@@ -7,7 +7,7 @@ nav_order: 1
 
 # Get started with Trace Analytics
 
-Open Distro for Elasticsearch Trace Analytics consists of two components---Data Prepper and the Trace Analytics Kibana plugin---that fit into the OpenTelemetry and Elasticsearch ecosystems. The Data Prepper repository has several [sample applications](https://github.com/opendistro-for-elasticsearch/Data-Prepper/tree/master/examples) to help you get started.
+Open Distro for Elasticsearch Trace Analytics consists of two components---Data Prepper and the Trace Analytics Kibana plugin---that fit into the OpenTelemetry and Elasticsearch ecosystems. The Data Prepper repository has several [sample applications](https://github.com/opendistro-for-elasticsearch/data-prepper/tree/main/examples) to help you get started.
 
 
 ## Basic flow of data
@@ -29,7 +29,7 @@ Open Distro for Elasticsearch Trace Analytics consists of two components---Data
 
 One Trace Analytics sample application is the Jaeger HotROD demo, which mimics the flow of data through a distributed application.
 
-Download or clone the [Data Prepper repository](https://github.com/opendistro-for-elasticsearch/Data-Prepper/tree/master/examples). Then navigate to `examples/jaeger-hotrod/` and open `docker-compose.yml` in a text editor. This file contains a container for each element from [Basic flow of data](#basic-flow-of-data):
+Download or clone the [Data Prepper repository](https://github.com/opendistro-for-elasticsearch/data-prepper). Then navigate to `examples/jaeger-hotrod/` and open `docker-compose.yml` in a text editor. This file contains a container for each element from [Basic flow of data](#basic-flow-of-data):
 
 - A distributed application (`jaeger-hot-rod`) with the Jaeger agent (`jaeger-agent`)
 - The [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/getting-started/) (`otel-collector`)