diff --git a/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-beats-to-elasticsearch-service-with-logstash-as-proxy.md b/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-beats-to-elasticsearch-service-with-logstash-as-proxy.md index 6eb89bde47..cf8928ecc3 100644 --- a/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-beats-to-elasticsearch-service-with-logstash-as-proxy.md +++ b/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-beats-to-elasticsearch-service-with-logstash-as-proxy.md @@ -4,7 +4,7 @@ mapped_urls: - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-getting-started-search-use-cases-beats-logstash.html --- -# Ingest data from Beats to Elasticsearch Service with Logstash as a proxy +# Ingest data from Beats to Elastic Cloud with Logstash as a proxy % What needs to be done: Refine @@ -55,4 +55,509 @@ $$$ece-beats-logstash-metricbeat$$$ $$$ece-beats-logstash-stdout$$$ -$$$ece-beats-logstash-view-kibana$$$ \ No newline at end of file +$$$ece-beats-logstash-view-kibana$$$ + +This guide explains how to ingest data from Filebeat and Metricbeat to {{ls}} as an intermediary, and then send that data to Elasticsearch Service. Using {{ls}} as a proxy limits your Elastic stack traffic through a single, external-facing firewall exception or rule. Consider the following features of this type of setup: + +* You can send multiple instances of Beats data through your local network’s demilitarized zone (DMZ) to {{ls}}. {{ls}} then acts as a proxy through your firewall to send the Beats data to Elasticsearch Service, as shown in the following diagram: + + ![A diagram showing data from multiple Beats into Logstash](../../../images/cloud-ec-logstash-beats-dataflow.png "") + +* This proxying reduces the firewall exceptions or rules necessary for Beats to communicate with Elasticsearch Service. It’s common to have many Beats dispersed across a network, each installed close to the data that it monitors, and each Beat individually communicating with an Elasticsearch Service deployment. Multiple Beats support multiple servers. Rather than configure each Beat to send its data directly to Elasticsearch Service, you can use {{ls}} to proxy this traffic through one firewall exception or rule. +* This setup is not suitable in simple scenarios when there is only one or a couple of Beats in use. {{ls}} makes the most sense for proxying when there are many Beats. + +The configuration in this example makes use of the System module, available for both Filebeat and Metricbeat. Filebeat’s System sends server system log details (that is, login success/failures, sudo *superuser do* command usage, and other key usage details). Metricbeat’s System module sends memory, CPU, disk, and other server usage metrics. + +*Time required: 1 hour* + + +## Create a deployment [ec-beats-logstash-trial] + +::::{tab-set} + +:::{tab-item} Elastic Cloud Hosted +1. [Get a free trial](https://cloud.elastic.co/registration?page=docs&placement=docs-body). +2. Log into [Elastic Cloud](https://cloud.elastic.co?page=docs&placement=docs-body). +3. Select **Create deployment**. +4. Give your deployment a name. You can leave all other settings at their default values. +5. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +6. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. + +Prefer not to subscribe to yet another service? You can also get Elasticsearch Service through [AWS, Azure, and GCP marketplaces](../../../deploy-manage/deploy/elastic-cloud/subscribe-from-marketplace.md). +::: + +:::{tab-item} Elastic Cloud Enterprise +1. Log into the Elastic Cloud Enterprise admin console. +2. Select **Create deployment**. +3. Give your deployment a name. You can leave all other settings at their default values. +4. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +5. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. +::: + +:::: + +## Connect securely [ec-beats-logstash-connect-securely] + +When connecting to Elasticsearch Service you can use a Cloud ID to specify the connection details. You must pass the Cloud ID that you can find in the cloud console. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. + +To connect to, stream data to, and issue queries with Elasticsearch Service, you need to think about authentication. Two authentication mechanisms are supported, *API key* and *basic authentication*. Here, to get you started quickly, we’ll show you how to use basic authentication, but you can also generate API keys as shown later on. API keys are safer and preferred for production environments. + + +## Set up {{ls}} [ec-beats-logstash-logstash] + +[Download](https://www.elastic.co/downloads/logstash) and unpack {{ls}} on the local machine that hosts Beats or another machine granted access to the Beats machines. + + +## Set up Metricbeat [ec-beats-logstash-metricbeat] + +Now that {{ls}} is downloaded and your Elasticsearch Service deployment is set up, you can configure Metricbeat to send operational data to {{ls}}. + +Install Metricbeat as close as possible to the service that you want to monitor. For example, if you have four servers with MySQL running, we recommend that you run Metricbeat on each server. This allows Metricbeat to access your service from *localhost*. This setup does not cause any additional network traffic and enables Metricbeat to collect metrics even in the event of network problems. Metrics from multiple Metricbeat instances are combined on the {{ls}} server. + +If you have multiple servers with metrics data, repeat the following steps to configure Metricbeat on each server. + +**Download Metricbeat** + +[Download Metricbeat](https://www.elastic.co/downloads/beats/metricbeat) and unpack it on the local server from which you want to collect data. + +**About Metricbeat modules** + +Metricbeat has [many modules](https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-modules.html) available that collect common metrics. You can [configure additional modules](https://www.elastic.co/guide/en/beats/metricbeat/current/configuration-metricbeat.html) as needed. For this example we’re using Metricbeat’s default configuration, which has the [System module](https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-system.html) enabled. The System module allows you to monitor servers with the default set of metrics: *cpu*, *load*, *memory*, *network*, *process*, *process_summary*, *socket_summary*, *filesystem*, *fsstat*, and *uptime*. + +**Load the Metricbeat Kibana dashboards** + +Metricbeat comes packaged with example dashboards, visualizations, and searches for visualizing Metricbeat data in Kibana. Before you can use the dashboards, you need to create the data view (formerly *index pattern*) *metricbeat-**, and load the dashboards into Kibana. This needs to be done from a local Beats machine that has access to the Elasticsearch Service deployment. + +::::{note} +Beginning with Elastic Stack version 8.0, Kibana *index patterns* have been renamed to *data views*. To learn more, check the Kibana [What’s new in 8.0](https://www.elastic.co/guide/en/kibana/8.0/whats-new.html#index-pattern-rename) page. +:::: + + +1. Open a command line instance and then go to */metricbeat-/* +2. Run the following command: + +```txt +sudo ./metricbeat setup \ + -E cloud.id= \ <1> + -E cloud.auth=: <2> +``` + +1. Specify the Cloud ID of your Elasticsearch Service deployment. You can include or omit the `:` prefix at the beginning of the Cloud ID. Both versions work fine. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. +2. Specify the username and password provided to you when creating the deployment. Make sure to keep the colon between ** and **.::::{important} +Depending on variables including the installation location, environment and local permissions, you might need to [change the ownership](https://www.elastic.co/guide/en/beats/libbeat/current/config-file-permissions.html) of the metricbeat.yml. + +You might encounter similar permissions hurdles as you work through multiple sections of this document. These permission requirements are there for a good reason, a security safeguard to prevent unauthorized access and modification of key Elastic files. + +If this isn’t a production environment and you want a fast-pass with less permissions hassles, then you can disable strict permission checks from the command line by using `--strict.perms=false` when executing Beats (for example, `./metricbeat --strict.perms=false`). + +Depending on your system, you may also find that some commands need to be run as root, by prefixing `sudo` to the command. + +:::: + + + + +Your results should be similar to the following: + +```txt +Index setup finished. +Loading dashboards (Kibana must be running and reachable) +Loaded dashboards +``` + + +## Configure Metricbeat to send data to {{ls}} [ec-beats-logstash-metricbeat-send] + +1. In */metricbeat-/* (where ** is the directory where Metricbeat is installed), open the *metricbeat.yml* configuration file for editing. +2. Scroll down to the *Elasticsearch Output* section. Place a comment pound sign (*#*) in front of *output.elasticsearch* and {{es}} *hosts*. +3. Scroll down to the *Logstash Output* section. Remove the comment pound sign (*#*) from in front of *output.logstash* and *hosts*, as follows: + +```txt +# ---------------- Logstash Output ----------------- +output.logstash: + # The Logstash hosts + hosts: ["localhost:5044"] <1> +``` + +1. Replace `localhost` and the port number with the hostname and port number where Logstash is listening. + + + +## Set up Filebeat [ec-beats-logstash-filebeat] + +The next step is to configure Filebeat to send operational data to Logstash. As with Metricbeat, install Filebeat as close as possible to the service that you want to monitor. + +**Download Filebeat** + +[Download Filebeat](https://www.elastic.co/downloads/beats/filebeat) and unpack it on the local server from which you want to collect data. + +**Enable the Filebeat system module** + +Filebeat has [many modules](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-modules.html) available that collect common log types. You can [configure additional modules](https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-modules.html) as needed. For this example we’re using Filebeat’s [System module](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-module-system.html). This module reads in the various system log files (with information including login successes or failures, sudo command usage, and other key usage details) based on the detected operating system. For this example, a Linux-based OS is used and Filebeat ingests logs from the */var/log/* folder. It’s important to verify that Filebeat is given permission to access your logs folder through standard file and folder permissions. + +1. Go to */filebeat-/modules.d/* where ** is the directory where Filebeat is installed. +2. Filebeat requires at least one fileset to be enabled. In file */filebeat-/modules.d/system.yml.disabled*, under both `syslog` and `auth` set `enabled` to `true`: + +```txt +- module: system + # Syslog + syslog: + enabled: true + + # Authorization logs + auth: + enabled: true +``` + +From the */filebeat-* directory, run the `filebeat modules` command as shown: + +```txt +./filebeat modules enable system +``` + +The system module is now enabled in Filebeat and it will be used the next time Filebeat starts. + +**Load the Filebeat Kibana dashboards** + +Filebeat comes packaged with example Kibana dashboards, visualizations, and searches for visualizing Filebeat data in Kibana. Before you can use the dashboards, you need to create the data view *filebeat-**, and load the dashboards into Kibana. This needs to be done from a Beats machine that has access to the Internet. + +1. Open a command line instance and then go to */filebeat-/* +2. Run the following command: + +```txt +sudo ./filebeat setup \ + -E cloud.id= \ <1> + -E cloud.auth=: <2> +``` + +1. Specify the Cloud ID of your Elasticsearch Service deployment. You can include or omit the `:` prefix at the beginning of the Cloud ID. Both versions work fine. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. +2. Specify the username and password provided to you when creating the deployment. Make sure to keep the colon between ** and **.::::{important} +Depending on variables including the installation location, environment, and local permissions, you might need to [change the ownership](https://www.elastic.co/guide/en/beats/libbeat/current/config-file-permissions.html) of the filebeat.yml. +:::: + + + + +Your results should be similar to the following: + +```txt +Index setup finished. +Loading dashboards (Kibana must be running and reachable) +Loaded dashboards +Setting up ML using setup --machine-learning is going to be removed in 8.0.0. Please use the ML app instead. +See more: https://www.elastic.co/guide/en/machine-learning/current/index.html +Loaded machine learning job configurations +Loaded Ingest pipelines +``` + +1. Exit the CLI. + +The data views for *filebeat-** and *metricbeat-** are now available in {{es}}. To verify: + +1. [Login to Kibana](../../../deploy-manage/deploy/elastic-cloud/access-kibana.md). +2. Open the Kibana main menu and select **Management** and go to **Kibana** > **Data views**. +3. In the search bar, search for *data views*. +4. In the search results, choose *Kibana / Data Views Management*. + +**Finish configuring Filebeat** + +1. In */filebeat-/* (where ** is the directory where Filebeat is installed), open the *filebeat.yml* configuration file for editing. +2. Scroll down to the *Outputs* section. Place a comment pound sign (*#*) in front of *output.elasticsearch* and {{es}} *hosts*. +3. Scroll down to the *Logstash Output* section. Remove the comment pound sign (*#*) from in front of *output.logstash* and *hosts* as follows: + +```txt +# ---------------- Logstash Output ----------------- +output.logstash: + # The Logstash hosts + hosts: ["localhost:5044"] <1> +``` + +1. Replace `localhost` and the port number with the hostname and port number where Logstash is listening. + + + +## Configure {{ls}} to listen for Beats [ec-beats-logstash-listen] + +Now the Filebeat and Metricbeat are set up, let’s configure a {{ls}} pipeline to input data from Beats and send results to the standard output. This enables you to verify the data output before sending it for indexing in {{es}}. + +1. In */logstash-/*, create a new text file named *beats.conf*. +2. Copy and paste the following code into the new text file. This code creates a {{ls}} pipeline that listens for connections from Beats on port 5044 and writes to standard out (typically to your terminal) with formatting provided by the {{ls}} rubydebug output plugin. + + ```txt + input { + beats{port => 5044} <1> + } + output { + stdout{codec => rubydebug} <2> + } + ``` + + 1. {{ls}} listens for Beats input on the default port of 5044. Only one line is needed to do this. {{ls}} can handle input from many Beats of the same and also of varying types (Metricbeat, Filebeat, and others). + 2. This sends output to the standard output, which displays through your command line interface. This plugin enables you to verify the data before you send it to {{es}}, in a later step. + +3. Save the new *beats.conf* file in your Logstash folder. To learn more about the file format and options, check [{{ls}} Configuration Examples](https://www.elastic.co/guide/en/logstash/current/config-examples.html). + + +## Output {{ls}} data to stdout [ec-beats-logstash-stdout] + +Now, let’s try out the {{ls}} pipeline with the Metricbeats and Filebeats configurations from the prior steps. Each Beat sends data into a {{ls}} pipeline, and the results display on the standard output where you can verify that everything looks correct. + +**Test Metricbeat to stdout** + +1. Open a command line interface instance. Go to */logstash-/*, where is the directory where {{ls}} is installed, and start {{ls}} by running the following command: + + ```txt + bin/logstash -f beats.conf + ``` + +2. Open a second command line interface instance. Go to */metricbeat-/*, where is the directory where Metricbeat is installed, and start Metricbeat by running the following command: + + ```txt + ./metricbeat -c metricbeat.yml + ``` + +3. Switch back to your first command line interface instance with {{ls}}. Now, Metricbeat events are input into {{ls}} and the output data is directed to the standard output. Your results should be similar to the following: + + ```txt + "tags" => [ + [0] "beats_input_raw_event" + ], + "agent" => { + "type" => "metricbeat", + "name" => "john-VirtualBox", + "version" => "8.13.1", + "ephemeral_id" => "1e69064c-d49f-4ec0-8414-9ab79b6f27a4", + "id" => "1b6c39e8-025f-4310-bcf1-818930a411d4", + "hostname" => "john-VirtualBox" + }, + "service" => { + "type" => "system" + }, + "event" => { + "duration" => 39833, + "module" => "system", + "dataset" => "system.cpu" + }, + "@timestamp" => 2021-04-21T17:06:05.231Z, + "metricset" => { + "name" => "cpu", + "period" => 10000 + }, + "@version" => "1","host" => { + "id" => "939972095cf1459c8b22cc608eff85da", + "ip" => [ + [0] "10.0.2.15", + [1] "fe80::3700:763c:4ba3:e48c" + ], + "name" => "john-VirtualBox","mac" => [ + [0] "08:00:27:a3:c7:a9" + ], + "os" => { + "type" => "linux", + ``` + +4. Switch back to the Metricbeat command line instance. Enter *CTRL + C* to shut down Metricbeat, and then exit the CLI. +5. Switch back to the {{ls}} command line instance. Enter *CTRL + C* to shut down {{ls}}, and then exit the CLI. + +**Test Filebeat to stdout** + +1. Open a command line interface instance. Go to */logstash-/*, where is the directory where {{ls}} is installed, and start {{ls}} by running the following command: + + ```txt + bin/logstash -f beats.conf + ``` + +2. Open a second command line interface instance. Go to */filebeat-/*, where is the directory where Filebeat is installed, and start Filebeat by running the following command: + + ```txt + ./filebeat -c filebeat.yml + ``` + +3. Switch back to your first command line interface instance with {{ls}}. Now, Filebeat events are input into {{ls}} and the output data is directed to the standard output. Your results should be similar to the following: + + ```txt + { + "service" => { + "type" => "system" + }, + "event" => { + "timezone" => "-04:00", + "dataset" => "system.syslog", + "module" => "system" + }, + "fileset" => { + "name" => "syslog" + }, + "agent" => { + "id" => "113dc127-21fa-4ebb-ab86-8a151d6a23a6", + "type" => "filebeat", + "version" => "8.13.1", + "hostname" => "john-VirtualBox", + "ephemeral_id" => "1058ad74-8494-4a5e-9f48-ad7c5b9da915", + "name" => "john-VirtualBox" + }, + "@timestamp" => 2021-04-28T15:33:41.727Z, + "input" => { + "type" => "log" + }, + "ecs" => { + "version" => "1.8.0" + }, + "@version" => "1", + "log" => { + "offset" => 73281, + "file" => { + "path" => "/var/log/syslog" + } + }, + ``` + +4. Review the {{ls}} output results to make sure your data looks correct. Enter *CTRL + C* to shut down {{ls}}. +5. Switch back to the Filebeats CLI. Enter *CTRL + C* to shut down Filebeat. + + +## Output {{ls}} data to {{es}} [ec-beats-logstash-elasticsearch] + +In this section, you configure {{ls}} to send the Metricbeat and Filebeat data to {{es}}. You modify the *beats.conf* created earlier, and specify the output credentials needed for our Elasticsearch Service deployment. Then, you start {{ls}} to send the Beats data into {{es}}. + +1. In your */logstash-/* folder, open *beats.conf* for editing. +2. Replace the *output {}* section of the JSON with the following code: + + ```txt + output { + elasticsearch { + index => "%{[@metadata][beat]}-%{[@metadata][version]}" + ilm_enabled => true + cloud_id => ":" <1> + cloud_auth => "elastic:" <2> + ssl => true + # api_key => "" + } + } + ``` + + 1. Use the Cloud ID of your Elasticsearch Service deployment. You can include or omit the `:` prefix at the beginning of the Cloud ID. Both versions work fine. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. + 2. the default usename is `elastic`. It is not recommended to use the `elastic` account for ingesting data as this is a superuser. We recommend using a user with reduced permissions, or an API Key with permissions specific to the indices or data streams that will be written to. Check the [Grant access to secured resources](https://www.elastic.co/guide/en/beats/filebeat/current/feature-roles.html) for information on the writer role and API Keys. Use the password provided when you created the deployment if using the `elastic` user, or the password used when creating a new ingest user with the roles specified in the [Grant access to secured resources](https://www.elastic.co/guide/en/beats/filebeat/current/feature-roles.html) documentation. + + + Following are some additional details about the configuration file settings: + + * *index*: We specify the name of the {{es}} index with which to associate the Beats output. + + * *%{[@metadata][beat]}* sets the first part of the index name to the value of the Beat metadata field. + * *%{[@metadata][version]}* sets the second part of the index name to the Beat version. + + If you use Metricbeat version 8.13.1, the index created in {{es}} is named *metricbeat-8.13.1*. Similarly, using the 8.13.1 version of Filebeat, the {{es}} index is named *filebeat-8.13.1*. + + * *cloud_id*: This is the ID that uniquely identifies your Elasticsearch Service deployment. + * *ssl*: This should be set to `true` so that Secure Socket Layer (SSL) certificates are used for secure communication between {{ls}} and your Elasticsearch Service deployment. + * *ilm_enabled*: Enables and disables Elasticsearch Service [index lifecycle management](../../../manage-data/lifecycle/index-lifecycle-management.md). + * *api_key*: If you choose to use an API key to authenticate (as discussed in the next step), you can provide it here. + +3. **Optional**: For additional security, you can generate an {{es}} API key through the Elasticsearch Service console and configure {{ls}} to use the new key to connect securely to the Elasticsearch Service. + + 1. Log in to the [Elasticsearch Service Console](https://cloud.elastic.co?page=docs&placement=docs-body). + 2. Select the deployment and go to **☰** > **Management** > **Dev Tools**. + 3. Enter the following: + + ```json + POST /_security/api_key + { + "name": "logstash-apikey", + "role_descriptors": { + "logstash_read_write": { + "cluster": ["manage_index_templates", "monitor"], + "index": [ + { + "names": ["logstash-*","metricbeat-*","filebeat-*"], + "privileges": ["create_index", "write", "read", "manage"] + } + ] + } + } + } + ``` + + This creates an API key with the cluster `monitor` privilege which gives read-only access for determining the cluster state, and `manage_index_templates` which allows all operations on index templates. Some additional privileges also allow `create_index`, `write`, and `manage` operations for the specified index. The index `manage` privilege is added to enable index refreshes. + + 4. Click **▶**. The output should be similar to the following: + + ```json + { + "api_key": "aB1cdeF-GJI23jble4NOH4", + "id": "2GBe63fBcxgJAetmgZeh", + "name": "logstash_api_key" + } + ``` + + 5. Enter your new `api_key` value into the {{ls}} `beats.conf` file, in the format `:`. If your results were as shown in this example, you would enter `2GBe63fBcxgJAetmgZeh:aB1cdeF-GJI23jble4NOH4`. Remember to remove the pound (`#`) sign to uncomment the line, and comment out the `username` and `password` lines: + + ```txt + output { + elasticsearch { + index => "%{[@metadata][beat]}-%{[@metadata][version]}" + cloud_id => "" + ssl => true + ilm_enabled => true + api_key => "2GBe63fBcxgJAetmgZeh:aB1cdeF-GJI23jble4NOH4" + # user => "" + # password => "" + } + } + ``` + +4. Open a command line interface instance, go to your {{ls}} installation path, and start {{ls}}: + + ```txt + bin/logstash -f beats.conf + ``` + +5. Open a second command line interface instance, go to your Metricbeat installation path, and start Metricbeat: + + ```txt + ./metricbeat -c metricbeat.yml + ``` + +6. Open a third command line interface instance, go to your Filebeat installation path, and start Filebeat: + + ```txt + ./filebeat -c filebeat.yml + ``` + +7. {{ls}} now outputs the Filebeat and Metricbeat data to your Elasticsearch Service instance. + +::::{note} +In this guide, you manually launch each of the Elastic stack applications through the command line interface. In production, you may prefer to configure {{ls}}, Metricbeat, and Filebeat to run as System Services. Check the following pages for the steps to configure each application to run as a service: + +* [Running {{ls}} as a service on Debian or RPM](https://www.elastic.co/guide/en/logstash/current/running-logstash.html) +* [Metricbeat and systemd](https://www.elastic.co/guide/en/beats/metricbeat/current/running-with-systemd.html) +* [Start filebeat](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-starting.html) + +:::: + + + +## View data in Kibana [ec-beats-logstash-view-kibana] + +In this section, you log into Elasticsearch Service, open Kibana, and view the Kibana dashboards populated with our Metricbeat and Filebeat data. + +**View the Metricbeat dashboard** + +1. [Login to Kibana](../../../deploy-manage/deploy/elastic-cloud/access-kibana.md). +2. Open the Kibana main menu and select **Analytics**, then **Dashboard**. +3. In the search box, search for *metricbeat system*. The search results show several dashboards available for you to explore. +4. In the search results, choose *[Metricbeat System] Overview ECS*. A Metricbeat dashboard opens: + +![A screencapture of the Kibana dashboard named Metricbeat System Overview ECS](../../../images/cloud-ec-logstash-beats-metricbeat-dashboard.png "") + +**View the Filebeat dashboard** + +1. Open the Kibana main menu and select **Analytics**, then **Dashboard**. +2. In the search box, search for *filebeat system*. +3. In the search results, choose *[Filebeat System] Syslog dashboard ECS*. A Filebeat dashboard displaying your Filebeat data: + +![A screencapture of the Kibana dashboard named Filebeat System ECS](../../../images/cloud-ec-logstash-beats-filebeat-dashboard.png "") + +Now, you should have a good understanding of how to configure {{ls}} to ingest data from multiple Beats. You have the basics needed to begin experimenting with your own combination of Beats and modules. + + diff --git a/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-relational-database-into-elasticsearch-service.md b/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-relational-database-into-elasticsearch-service.md index b60d9b49af..4b44c2f773 100644 --- a/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-relational-database-into-elasticsearch-service.md +++ b/manage-data/ingest/ingesting-data-from-applications/ingest-data-from-relational-database-into-elasticsearch-service.md @@ -4,16 +4,7 @@ mapped_urls: - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-getting-started-search-use-cases-db-logstash.html --- -# Ingest data from a relational database into Elasticsearch Service - -% What needs to be done: Refine - -% Scope notes: Merge ESS and ECE versions (should be pretty much identical) - -% Use migrated content from existing pages that map to this page: - -% - [ ] ./raw-migrated-files/cloud/cloud/ec-getting-started-search-use-cases-db-logstash.md -% - [ ] ./raw-migrated-files/cloud/cloud-enterprise/ece-getting-started-search-use-cases-db-logstash.md +# Ingest data from a relational database into Elastic Cloud % Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): @@ -47,4 +38,404 @@ $$$ece-db-logstash-output$$$ $$$ece-db-logstash-pipeline$$$ -$$$ece-db-logstash-prerequisites$$$ \ No newline at end of file +$$$ece-db-logstash-prerequisites$$$ + +This guide explains how to ingest data from a relational database into {{ess}} through [{{ls}}](https://www.elastic.co/guide/en/logstash/current/introduction.html), using the Logstash [JDBC input plugin](https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html). It demonstrates how Logstash can be used to efficiently copy records and to receive updates from a relational database, and then send them into {{es}} in an Elasticsearch Service deployment. + +The code and methods presented here have been tested with MySQL. They should work with other relational databases. + +The Logstash Java Database Connectivity (JDBC) input plugin enables you to pull in data from many popular relational databases including MySQL and Postgres. Conceptually, the JDBC input plugin runs a loop that periodically polls the relational database for records that were inserted or modified since the last iteration of this loop. + +*Time required: 2 hours* + + +## Prerequisites [ec-db-logstash-prerequisites] + +For this tutorial you need a source MySQL instance for Logstash to read from. A free version of MySQL is available from the [MySQL Community Server section](https://dev.mysql.com/downloads/mysql/) of the MySQL Community Downloads site. + + +## Create a deployment [ec-db-logstash-trial] + +::::{tab-set} + +:::{tab-item} Elastic Cloud Hosted +1. [Get a free trial](https://cloud.elastic.co/registration?page=docs&placement=docs-body). +2. Log into [Elastic Cloud](https://cloud.elastic.co?page=docs&placement=docs-body). +3. Select **Create deployment**. +4. Give your deployment a name. You can leave all other settings at their default values. +5. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +6. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. + +Prefer not to subscribe to yet another service? You can also get Elasticsearch Service through [AWS, Azure, and GCP marketplaces](../../../deploy-manage/deploy/elastic-cloud/subscribe-from-marketplace.md). +::: + +:::{tab-item} Elastic Cloud Enterprise +1. Log into the Elastic Cloud Enterprise admin console. +2. Select **Create deployment**. +3. Give your deployment a name. You can leave all other settings at their default values. +4. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +5. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. +::: + +:::: + +## Connect securely [ec-db-logstash-connect-securely] + +When connecting to Elasticsearch Service you can use a Cloud ID to specify the connection details. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. + +To connect to, stream data to, and issue queries with Elasticsearch Service, you need to think about authentication. Two authentication mechanisms are supported, *API key* and *basic authentication*. Here, to get you started quickly, we’ll show you how to use basic authentication, but you can also generate API keys as shown later on. API keys are safer and preferred for production environments. + +1. [Download](https://www.elastic.co/downloads/logstash) and unpack Logstash on the local machine that hosts MySQL or another machine granted access to the MySQL machine. + + +## Get the MySQL JDBC driver [ec-db-logstash-driver] + +The Logstash JDBC input plugin does not include any database connection drivers. You need a JDBC driver for your relational database for the steps in the later section [Configure a Logstash pipeline with the JDBC input plugin](../../../manage-data/ingest/ingesting-data-from-applications/ingest-data-from-relational-database-into-elasticsearch-service.md#ec-db-logstash-pipeline). + +1. Download and unpack the JDBC driver for MySQL from the [Connector/J section](https://dev.mysql.com/downloads/connector/j/) of the MySQL Community Downloads site. +2. Make a note of the driver’s location as it’s needed in the steps that follow. + + +## Prepare a source MySQL database [ec-db-logstash-database] + +Let’s look at a simple database from which you’ll import data and send it to Elasticsearch Service. This example uses a MySQL database with timestamped records. The timestamps enable you to determine easily what’s changed in the database since the most recent data transfer to Elasticsearch Service. + + +### Consider the database structure and design [ec-db-logstash-database-structure] + +For this example, let’s create a new database *es_db* with table *es_table*, as the source of our Elasticsearch data. + +1. Run the following SQL statement to generate a new MySQL database with a three column table: + + ```txt + CREATE DATABASE es_db; + USE es_db; + DROP TABLE IF EXISTS es_table; + CREATE TABLE es_table ( + id BIGINT(20) UNSIGNED NOT NULL, + PRIMARY KEY (id), + UNIQUE KEY unique_id (id), + client_name VARCHAR(32) NOT NULL, + modification_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP + ); + ``` + + Let’s explore the key concepts in this SQL snippet: + + es_table + : The name of the table that stores the data. + + id + : The unique identifier for records. *id* is defined as both a PRIMARY KEY and UNIQUE KEY to guarantee that each *id* appears only once in the current table. This is translated to *_id* for updating or inserting the document into Elasticsearch. + + client_name + : The data that will ultimately be ingested into Elasticsearch. For simplicity, this example includes only a single data field. + + modification_time + : The timestamp of when the record was inserted or last updated. Further in, you can use this timestamp to determine what has changed since the last data transfer into Elasticsearch. + +2. Consider how to handle deletions and how to notify Elasticsearch about them. Often, deleting a record results in its immediate removal from the MySQL database. There’s no record of that deletion. The change isn’t detected by Logstash, so that record remains in Elasticsearch. + + There are two possible ways to address this: + + * You can use "soft deletes" in your source database. Essentially, a record is first marked for deletion through a boolean flag. Other programs that are currently using your source database would have to filter out "soft deletes" in their queries. The "soft deletes" are sent over to Elasticsearch, where they can be processed. After that, your source database and Elasticsearch must both remove these "soft deletes." + * You can periodically clear the Elasticsearch indices that are based off of the database, and then refresh Elasticsearch with a fresh ingest of the contents of the database. + +3. Log in to your MySQL server and add three records to your new database: + + ```txt + use es_db + INSERT INTO es_table (id, client_name) + VALUES (1,"Targaryen"), + (2,"Lannister"), + (3,"Stark"); + ``` + +4. Verify your data with a SQL statement: + + ```txt + select * from es_table; + ``` + + The output should look similar to the following: + + ```txt + +----+-------------+---------------------+ + | id | client_name | modification_time | + +----+-------------+---------------------+ + | 1 | Targaryen | 2021-04-21 12:17:16 | + | 2 | Lannister | 2021-04-21 12:17:16 | + | 3 | Stark | 2021-04-21 12:17:16 | + +----+-------------+---------------------+ + ``` + + Now, let’s go back to Logstash and configure it to ingest this data. + + + +## Configure a Logstash pipeline with the JDBC input plugin [ec-db-logstash-pipeline] + +Let’s set up a sample Logstash input pipeline to ingest data from your new JDBC Plugin and MySQL database. Beyond MySQL, you can input data from any database that supports JDBC. + +1. In `/logstash-7.12.0/`, create a new text file named `jdbc.conf`. +2. Copy and paste the following code into this new text file. This code creates a Logstash pipeline through a JDBC plugin. + + ```txt + input { + jdbc { + jdbc_driver_library => "/mysql-connector-java-.jar" <1> + jdbc_driver_class => "com.mysql.jdbc.Driver" + jdbc_connection_string => "jdbc:mysql://:3306/es_db" <2> + jdbc_user => "" <3> + jdbc_password => "" <3> + jdbc_paging_enabled => true + tracking_column => "unix_ts_in_secs" + use_column_value => true + tracking_column_type => "numeric" + schedule => "*/5 * * * * *" + statement => "SELECT *, UNIX_TIMESTAMP(modification_time) AS unix_ts_in_secs FROM es_table WHERE (UNIX_TIMESTAMP(modification_time) > :sql_last_value AND modification_time < NOW()) ORDER BY modification_time ASC" + } + } + filter { + mutate { + copy => { "id" => "[@metadata][_id]"} + remove_field => ["id", "@version", "unix_ts_in_secs"] + } + } + output { + stdout { codec => "rubydebug"} + } + ``` + + 1. Specify the full path to your local JDBC driver .jar file (including version number). For example: `jdbc_driver_library => "/usr/share/mysql/mysql-connector-java-8.0.24.jar"` + 2. Provide the IP address or hostname and the port of your MySQL host. For example, `jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/es_db"` + 3. Provide your MySQL credentials. The username and password must both be enclosed in quotation marks. + + + ::::{note} + If you are using MariaDB (a popular open source community fork of MySQL), there are a couple of things that you need to do differently: + + In place of the MySQL JDBC driver, download and unpack the [JDBC driver for MariaDB](https://downloads.mariadb.org/connector-java/). + + Substitute the following lines in the `jdbc.conf` code, including the `ANSI_QUOTES` snippet in the last line: + + ```txt + jdbc_driver_library => "/mariadb-java-client-.jar" + jdbc_driver_class => "org.mariadb.jdbc.Driver" + jdbc_connection_string => "jdbc:mariadb://:3306/es_db?sessionVariables=sql_mode=ANSI_QUOTES" + ``` + + :::: + + + Following are some additional details about the Logstash pipeline code: + + jdbc_driver_library + : The Logstash JDBC plugin does not come packaged with JDBC driver libraries. The JDBC driver library must be passed explicitly into the plugin using the `jdbc_driver_library` configuration option. + + tracking_column + : This parameter specifies the field `unix_ts_in_secs` that tracks the last document read by Logstash from MySQL, stored on disk in [logstash_jdbc_last_run](https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#plugins-inputs-jdbc-last_run_metadata_path). The parameter determines the starting value for documents that Logstash requests in the next iteration of its polling loop. The value stored in `logstash_jdbc_last_run` can be accessed in a SELECT statement as `sql_last_value`. + + unix_ts_in_secs + : The field generated by the SELECT statement, which contains the `modification_time` as a standard [Unix timestamp](https://en.wikipedia.org/wiki/Unix_time) (seconds since the epoch). The field is referenced by the `tracking column`. A Unix timestamp is used for tracking progress rather than a normal timestamp, as a normal timestamp may cause errors due to the complexity of correctly converting back and forth between UMT and the local timezone. + + sql_last_value + : This is a [built-in parameter](https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_predefined_parameters) containing the starting point of the current iteration of the Logstash polling loop, and it is referenced in the SELECT statement line of the JDBC input configuration. This parameter is set to the most recent value of `unix_ts_in_secs`, which is read from `.logstash_jdbc_last_run`. This value is the starting point for documents returned by the MySQL query that is executed in the Logstash polling loop. Including this variable in the query guarantees that we’re not resending data that is already stored in Elasticsearch. + + schedule + : This uses cron syntax to specify how often Logstash should poll MySQL for changes. The specification `*/5 * * * * *` tells Logstash to contact MySQL every 5 seconds. Input from this plugin can be scheduled to run periodically according to a specific schedule. This scheduling syntax is powered by [rufus-scheduler](https://github.com/jmettraux/rufus-scheduler). The syntax is cron-like with some extensions specific to Rufus (for example, timezone support). + + modification_time < NOW() + : This portion of the SELECT is explained in detail in the next section. + + filter + : In this section, the value `id` is copied from the MySQL record into a metadata field called `_id`, which is later referenced in the output to ensure that each document is written into Elasticsearch with the correct `_id` value. Using a metadata field ensures that this temporary value does not cause a new field to be created. The `id`, `@version`, and `unix_ts_in_secs` fields are also removed from the document, since they don’t need to be written to Elasticsearch. + + output + : This section specifies that each document should be written to the standard output using the rubydebug output to help with debugging. + +3. Launch Logstash with your new JDBC configuration file: + + ```txt + bin/logstash -f jdbc.conf + ``` + + Logstash outputs your MySQL data through standard output (`stdout`), your command line interface. The results for the initial data load should look similar to the following: + + ```txt + [INFO ] 2021-04-21 12:32:32.816 [Ruby-0-Thread-15: :1] jdbc - (0.009082s) SELECT * FROM (SELECT *, UNIX_TIMESTAMP(modification_time) AS unix_ts_in_secs FROM es_table WHERE (UNIX_TIMESTAMP(modification_time) > 0 AND modification_time < NOW()) ORDER BY modification_time ASC) AS 't1' LIMIT 100000 OFFSET 0 + { + "client_name" => "Targaryen", + "modification_time" => 2021-04-21T12:17:16.000Z, + "@timestamp" => 2021-04-21T12:17:16.923Z + } + { + "client_name" => "Lannister", + "modification_time" => 2021-04-21T12:17:16.000Z, + "@timestamp" => 2021-04-21T12:17:16.961Z + } + { + "client_name" => "Stark", + "modification_time" => 2021-04-21T12:17:16.000Z, + "@timestamp" => 2021-04-21T12:17:16.963Z + } + ``` + + The Logstash results periodically display SQL SELECT statements, even when there’s nothing new or modified in the MySQL database: + + ```txt + [INFO ] 2021-04-21 12:33:30.407 [Ruby-0-Thread-15: :1] jdbc - (0.002835s) SELECT count(*) AS 'count' FROM (SELECT *, UNIX_TIMESTAMP(modification_time) AS unix_ts_in_secs FROM es_table WHERE (UNIX_TIMESTAMP(modification_time) > 1618935436 AND modification_time < NOW()) ORDER BY modification_time ASC) AS 't1' LIMIT 1 + ``` + +4. Open your MySQL console. Let’s insert another record into that database using the following SQL statement: + + ```txt + use es_db + INSERT INTO es_table (id, client_name) + VALUES (4,"Baratheon"); + ``` + + Switch back to your Logstash console. Logstash detects the new record and the console displays results similar to the following: + + ```txt + [INFO ] 2021-04-21 12:37:05.303 [Ruby-0-Thread-15: :1] jdbc - (0.001205s) SELECT * FROM (SELECT *, UNIX_TIMESTAMP(modification_time) AS unix_ts_in_secs FROM es_table WHERE (UNIX_TIMESTAMP(modification_time) > 1618935436 AND modification_time < NOW()) ORDER BY modification_time ASC) AS 't1' LIMIT 100000 OFFSET 0 + { + "client_name" => "Baratheon", + "modification_time" => 2021-04-21T12:37:01.000Z, + "@timestamp" => 2021-04-21T12:37:05.312Z + } + ``` + +5. Review the Logstash output results to make sure your data looks correct. Use `CTRL + C` to shut down Logstash. + + +## Output to Elasticsearch [ec-db-logstash-output] + +In this section, we configure Logstash to send the MySQL data to Elasticsearch. We modify the configuration file created in the section [Configure a Logstash pipeline with the JDBC input plugin](../../../manage-data/ingest/ingesting-data-from-applications/ingest-data-from-relational-database-into-elasticsearch-service.md#ec-db-logstash-pipeline) so that data is output directly to Elasticsearch. We start Logstash to send the data, and then log into Elasticsearch Service to verify the data in Kibana. + +1. Open the `jdbc.conf` file in the Logstash folder for editing. +2. Update the output section with the one that follows: + + ```txt + output { + elasticsearch { + index => "rdbms_idx" + ilm_enabled => false + cloud_id => ":" <1> + cloud_auth => "elastic:" <2> + ssl => true + # api_key => "" + } + } + ``` + + 1. Use the Cloud ID of your Elasticsearch Service deployment. You can include or omit the `:` prefix at the beginning of the Cloud ID. Both versions work fine. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. + 2. the default username is `elastic`. It is not recommended to use the `elastic` account for ingesting data as this is a superuser. We recommend using a user with reduced permissions, or an API Key with permissions specific to the indices or data streams that will be written to. Check [Configuring security in Logstash](https://www.elastic.co/guide/en/logstash/current/ls-security.html) for information on roles and API Keys. Use the password provided when you created the deployment if using the `elastic` user, or the password used when creating a new ingest user with the roles specified in the [Configuring security in Logstash](https://www.elastic.co/guide/en/logstash/current/ls-security.html) documentation. + + + Following are some additional details about the configuration file settings: + + index + : The name of the Elasticsearch index, `rdbms_idx`, to associate the documents. + + api_key + : If you choose to use an API key to authenticate (as discussed in the next step), you can provide it here. + +3. **Optional**: For additional security, you can generate an Elasticsearch API key through the Elasticsearch Service console and configure Logstash to use the new key to connect securely to Elasticsearch Service. + + 1. Log in to the [Elasticsearch Service Console](https://cloud.elastic.co?page=docs&placement=docs-body). + 2. Select the deployment name and go to **☰** > **Management** > **Dev Tools**. + 3. Enter the following: + + ```json + POST /_security/api_key + { + "name": "logstash-apikey", + "role_descriptors": { + "logstash_read_write": { + "cluster": ["manage_index_templates", "monitor"], + "index": [ + { + "names": ["logstash-*","rdbms_idx"], + "privileges": ["create_index", "write", "read", "manage"] + } + ] + } + } + } + ``` + + This creates an API key with the cluster `monitor` privilege which gives read-only access for determining the cluster state, and `manage_index_templates` allows all operations on index templates. Some additional privileges also allow `create_index`, `write`, and `manage` operations for the specified index. The index `manage` privilege is added to enable index refreshes. + + 4. Click **▶**. The output should be similar to the following: + + ```json + { + "api_key": "tV1dnfF-GHI59ykgv4N0U3", + "id": "2TBR42gBabmINotmvZjv", + "name": "logstash_api_key" + } + ``` + + 5. Enter your new `api_key` value into the Logstash `jdbc.conf` file, in the format `:`. If your results were as shown in this example, you would enter `2TBR42gBabmINotmvZjv:tV1dnfF-GHI59ykgv4N0U3`. Remember to remove the pound (`#`) sign to uncomment the line, and comment out the `username` and `password` lines: + + ```txt + output { + elasticsearch { + index => "rdbms_idx" + cloud_id => "" + ssl => true + ilm_enabled => false + api_key => "2TBR42gBabmINotmvZjv:tV1dnfF-GHI59ykgv4N0U3" + # user => "" + # password => "" + } + } + ``` + +4. At this point, if you simply restart Logstash as is with your new output, then no MySQL data is sent to our Elasticsearch index. + + Why? Logstash retains the previous `sql_last_value` timestamp and sees that no new changes have occurred in the MySQL database since that time. Therefore, based on the SQL query that we configured, there’s no new data to send to Logstash. + + Solution: Add `clean_run => true` as a new line in the JDBC input section of the `jdbc.conf` file. When set to `true`, this parameter resets `sql_last_value` back to zero. + + ```txt + input { + jdbc { + ... + clean_run => true + ... + } + } + ``` + + After running Logstash once with `sql_last_value` set to `true` you can remove the `clean_run` line, unless you prefer the reset behavior to happen again at each restart of Logstash + +5. Open a command line interface instance, go to your Logstash installation path, and start Logstash: + + ```txt + bin/logstash -f jdbc.conf + ``` + +6. Logstash outputs the MySQL data to your Elasticsearch Service deployment. Let’s take a look in Kibana and verify that data: + + 1. Log in to the [Elasticsearch Service Console](https://cloud.elastic.co?page=docs&placement=docs-body). + 2. Select the deployment and go to **☰** > **Management** > **Dev Tools** + 3. Copy and paste the following API GET request into the Console pane, and then click **▶**. This queries all records in the new `rdbms_idx` index. + + ```txt + GET rdbms_idx/_search + { + "query": { + "match_all": {} + } + } + ``` + + 4. The Results pane lists the `client_name` records originating from your MySQL database, similar to the following example: + + ![A picture showing query results with three records](../../../images/cloud-ec-logstash-db-results-scenarios.png "") + + +Now, you should have a good understanding of how to configure Logstash to ingest data from your relational database through the JDBC Plugin. You have some design considerations to track records that are new, modified, and deleted. You should have the basics needed to begin experimenting with your own database and Elasticsearch. + diff --git a/manage-data/ingest/ingesting-data-from-applications/ingest-data-with-nodejs-on-elasticsearch-service.md b/manage-data/ingest/ingesting-data-from-applications/ingest-data-with-nodejs-on-elasticsearch-service.md index 94273c800b..378831465e 100644 --- a/manage-data/ingest/ingesting-data-from-applications/ingest-data-with-nodejs-on-elasticsearch-service.md +++ b/manage-data/ingest/ingesting-data-from-applications/ingest-data-with-nodejs-on-elasticsearch-service.md @@ -4,13 +4,309 @@ mapped_urls: - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-getting-started-node-js.html --- -# Ingest data with Node.js on Elasticsearch Service +# Ingest data with Node.js -% What needs to be done: Refine +This guide tells you how to get started with: -% Scope notes: Merge ESS and ECE versions (should be pretty much identical) +* Securely connecting to Elasticsearch Service with Node.js +* Ingesting data into your deployment from your application +* Searching and modifying your data on Elasticsearch Service -% Use migrated content from existing pages that map to this page: +If you are an Node.js application programmer who is new to the Elastic Stack, this content helps you get started more easily. + +*Time required: 45 minutes* + +## Create a deployment [ec_get_elasticsearch_service] + +::::{tab-set} + +:::{tab-item} Elastic Cloud Hosted +1. [Get a free trial](https://cloud.elastic.co/registration?page=docs&placement=docs-body). +2. Log into [Elastic Cloud](https://cloud.elastic.co?page=docs&placement=docs-body). +3. Select **Create deployment**. +4. Give your deployment a name. You can leave all other settings at their default values. +5. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +6. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. + +Prefer not to subscribe to yet another service? You can also get Elasticsearch Service through [AWS, Azure, and GCP marketplaces](../../../deploy-manage/deploy/elastic-cloud/subscribe-from-marketplace.md). +::: + +:::{tab-item} Elastic Cloud Enterprise +1. Log into the Elastic Cloud Enterprise admin console. +2. Select **Create deployment**. +3. Give your deployment a name. You can leave all other settings at their default values. +4. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +5. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. +::: + +:::: + +## Set up your application [ec_set_up_your_application] + +These steps are applicable to your existing application. If you don’t have one, use the example included here to create one. + + +### Create the npm `package.json` file [ec_create_the_npm_package_json_file] + +```sh +npm init +``` + + +### Get the `elasticsearch` and `config` packages [ec_get_the_elasticsearch_and_config_packages] + +```sh +npm install @elastic/elasticsearch +npm install config +``` + +::::{note} +The `config` package is not required if you have your own method to keep your configuration details private. +:::: + + + +### Create a configuration file [ec_create_a_configuration_file] + +```sh +mkdir config +vi config/default.json +``` + +The example here shows what the `config` package expects. You need to update `config/default.json` with your deployment details in the following sections: + +```json +{ + "elastic": { + "cloudID": "DEPLOYMENT_NAME:CLOUD_ID_DETAILS", <1> + "username": "elastic", + "password": "LONGPASSWORD" + } +} +``` + +1. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. + + + +## About connecting securely [ec_about_connecting_securely] + +When connecting to Elasticsearch Service use a Cloud ID to specify the connection details. You must pass the Cloud ID that is found in {{kib}} or the cloud console. + +To connect to, stream data to, and issue queries with Elasticsearch Service, you need to think about authentication. Two authentication mechanisms are supported, *API key* and *basic authentication*. Here, to get you started quickly, we’ll show you how to use basic authentication, but you can also generate API keys as shown later on. API keys are safer and preferred for production environments. + + +### Basic authentication [ec_basic_authentication] + +For basic authentication, use the same deployment credentials (`username` and `password` parameters) and Cloud ID you copied down earlier when you created your deployment. (If you did not save the password, you can [reset the password](../../../deploy-manage/users-roles/cluster-or-deployment-auth/built-in-users.md) .) + + +## Create a sample application [ec_create_a_sample_application] + +The sample application connects to {{es}}, creates an index, inserts some records, performs a search, and updates a record. + +Read the configuration created earlier, and connect to {{es}}: + +```javascript +const { Client } = require('@elastic/elasticsearch') +const config = require('config'); +const elasticConfig = config.get('elastic'); + +const client = new Client({ + cloud: { + id: elasticConfig.cloudID + }, + auth: { + username: elasticConfig.username, + password: elasticConfig.password + } +}) +``` + +Now confirm that you are connected to the deployment by returning some information about the deployment: + +```javascript +client.info() + .then(response => console.log(response)) + .catch(error => console.error(error)) +``` + + +## Ingest data [ec_ingest_data] + +After connecting to your deployment, you are ready to index and search data. Let’s create a new index, insert some quotes from our favorite characters, and refresh the index so that it is ready to be searched. A refresh makes all operations performed on an index since the last refresh available for search. + +```javascript +async function run() { + await client.index({ + index: 'game-of-thrones', + body: { + character: 'Ned Stark', + quote: 'Winter is coming.' + } + }) + + await client.index({ + index: 'game-of-thrones', + body: { + character: 'Daenerys Targaryen', + quote: 'I am the blood of the dragon.' + } + }) + + await client.index({ + index: 'game-of-thrones', + body: { + character: 'Tyrion Lannister', + quote: 'A mind needs books like a sword needs whetstone.' + } + }) + + await client.indices.refresh({index: 'game-of-thrones'}) +} + +run().catch(console.log) +``` + +When using the [client.index](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#_index) API, the request automatically creates the `game-of-thrones` index if it doesn’t already exist, as well as document IDs for each indexed document if they are not explicitly specified. + + +## Search and modify data [ec_search_and_modify_data] + +After creating a new index and ingesting some data, you are now ready to search. Let’s find what characters have said things about `winter`: + +```javascript +async function read() { + const { body } = await client.search({ + index: 'game-of-thrones', + body: { + query: { + match: { quote: 'winter' } + } + } + }) + console.log(body.hits.hits) +} + +read().catch(console.log) +``` + +The search request returns content of documents containing `'winter'` in the `quote` field, including document IDs that were automatically generated. You can make updates to specific documents using document IDs. Let’s add a birthplace for our character: + +```javascript +async function update() { + await client.update({ + index: 'game-of-thrones', + id: , + body: { + script: { + source: "ctx._source.birthplace = 'Winterfell'" + } + } + }) + const { body } = await client.get({ + index: 'game-of-thrones', + id: + }) + + console.log(body) +} + +update().catch(console.log) +``` + +This [more comprehensive list of API examples](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/examples.html) includes bulk operations, checking the existence of documents, updating by query, deleting, scrolling, and SQL queries. To learn more, check the complete [API reference](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html). + + +## Switch to API key authentication [ec_switch_to_api_key_authentication] + +To get started, authentication to {{es}} used the `elastic` superuser and password, but an API key is much safer and a best practice for production. + +In the example that follows, an API key is created with the cluster `monitor` privilege which gives read-only access for determining the cluster state. Some additional privileges also allow `create_index`, `write`, `read`, and `manage` operations for the specified index. The index `manage` privilege is added to enable index refreshes. + +The `security.createApiKey` function returns an `id` and `api_key` value which can then be concatenated and encoded in `base64`: + +```javascript +async function generateApiKeys (opts) { + const { body } = await client.security.createApiKey({ + body: { + name: 'nodejs_example', + role_descriptors: { + 'nodejs_example_writer': { + 'cluster': ['monitor'], + 'index': [ + { + 'names': ['game-of-thrones'], + 'privileges': ['create_index', 'write', 'read', 'manage'] + } + ] + } + } + } + }) + + return Buffer.from(`${body.id}:${body.api_key}`).toString('base64') +} + +generateApiKeys() + .then(console.log) + .catch(err => { + console.error(err) + process.exit(1) +}) +``` + +The `base64` encoded output is as follows and is ready to be added to the configuration file: + +```text +API_KEY_DETAILS +``` + +Edit the `config/default.json` configuration file you created earlier and add this API key: + +```json +{ + "elastic-cloud": { + "cloudID": "DEPLOYMENT_NAME:CLOUD_ID_DETAILS", + "username": "elastic", + "password": "LONGPASSWORD", + "apiKey": "API_KEY_DETAILS" + } +} +``` + +Now the API key can be used in place of the username and password. The client connection becomes: + +```javascript +const elasticConfig = config.get('elastic-cloud'); + +const client = new Client({ + cloud: { + id: elasticConfig.cloudID + }, + auth: { + apiKey: elasticConfig.apiKey + } +}) +``` + +Check [Create API key API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html) to learn more about API Keys and [Security privileges](../../../deploy-manage/users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md) to understand which privileges are needed. If you are not sure what the right combination of privileges for your custom application is, you can enable [audit logging](../../../deploy-manage/monitor/logging-configuration/enabling-elasticsearch-audit-logs.md) on {{es}} to find out what privileges are being used. To learn more about how logging works on Elasticsearch Service, check [Monitoring Elastic Cloud deployment logs and metrics](https://www.elastic.co/blog/monitoring-elastic-cloud-deployment-logs-and-metrics). + + +### Best practices [ec_best_practices] + +Security +: When connecting to Elasticsearch Service, the client automatically enables both request and response compression by default, since it yields significant throughput improvements. Moreover, the client also sets the SSL option `secureProtocol` to `TLSv1_2_method` unless specified otherwise. You can still override this option by configuring it. + + Do not enable sniffing when using Elasticsearch Service, since the nodes are behind a load balancer. Elasticsearch Service takes care of everything for you. Take a look at [Elasticsearch sniffing best practices: What, when, why, how](https://www.elastic.co/blog/elasticsearch-sniffing-best-practices-what-when-why-how) if you want to know more. + + +Connections +: If your application connecting to Elasticsearch Service runs under the Java security manager, you should at least disable the caching of positive hostname resolutions. To learn more, check the [Java API Client documentation](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/_others.html). + +Schema +: When the example code was run an index mapping was created automatically. The field types were selected by {{es}} based on the content seen when the first record was ingested, and updated as new fields appeared in the data. It would be more efficient to specify the fields and field types in advance to optimize performance. Refer to the Elastic Common Schema documentation and Field Type documentation when you are designing the schema for your production use cases. + +Ingest +: For more advanced scenarios, this [bulk ingestion](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/bulk_examples.html) reference gives an example of the `bulk` API that makes it possible to perform multiple operations in a single call. This bulk example also explicitly specifies document IDs. If you have a lot of documents to index, using bulk to batch document operations is significantly faster than submitting requests individually. -% - [ ] ./raw-migrated-files/cloud/cloud/ec-getting-started-node-js.md -% - [ ] ./raw-migrated-files/cloud/cloud-enterprise/ece-getting-started-node-js.md \ No newline at end of file diff --git a/manage-data/ingest/ingesting-data-from-applications/ingest-data-with-python-on-elasticsearch-service.md b/manage-data/ingest/ingesting-data-from-applications/ingest-data-with-python-on-elasticsearch-service.md index 1734c7e22c..ecb9630fb8 100644 --- a/manage-data/ingest/ingesting-data-from-applications/ingest-data-with-python-on-elasticsearch-service.md +++ b/manage-data/ingest/ingesting-data-from-applications/ingest-data-with-python-on-elasticsearch-service.md @@ -4,13 +4,369 @@ mapped_urls: - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-getting-started-python.html --- -# Ingest data with Python on Elasticsearch Service +# Ingest data with Python on Elastic Cloud -% What needs to be done: Refine +This guide tells you how to get started with: -% Scope notes: Merge ESS and ECE versions (should be pretty much identical) +* Securely connecting to Elasticsearch Service with Python +* Ingesting data into your deployment from your application +* Searching and modifying your data on Elasticsearch Service -% Use migrated content from existing pages that map to this page: +If you are a Python application programmer who is new to the Elastic Stack, this content can help you get started more easily. + +*Time required: 45 minutes* + + +## Prerequisites [ec_prerequisites] + +These steps are applicable to your existing application. If you don’t have one, you can use the example included here to create one. + + +### Get the `elasticsearch` packages [ec_get_the_elasticsearch_packages] + +```sh +python -m pip install elasticsearch +python -m pip install elasticsearch-async +``` + + +### Create the `setup.py` file [ec_create_the_setup_py_file] + +```sh +# Elasticsearch 7.x +elasticsearch>=7.0.0,<8.0.0 +``` + +## Create a deployment [ec_get_elasticsearch_service_2] + +::::{tab-set} + +:::{tab-item} Elastic Cloud Hosted +1. [Get a free trial](https://cloud.elastic.co/registration?page=docs&placement=docs-body). +2. Log into [Elastic Cloud](https://cloud.elastic.co?page=docs&placement=docs-body). +3. Select **Create deployment**. +4. Give your deployment a name. You can leave all other settings at their default values. +5. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +6. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. + +Prefer not to subscribe to yet another service? You can also get Elasticsearch Service through [AWS, Azure, and GCP marketplaces](../../../deploy-manage/deploy/elastic-cloud/subscribe-from-marketplace.md). +::: + +:::{tab-item} Elastic Cloud Enterprise +1. Log into the Elastic Cloud Enterprise admin console. +2. Select **Create deployment**. +3. Give your deployment a name. You can leave all other settings at their default values. +4. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +5. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. +::: + +:::: + +## Connect securely [ec_connect_securely] + +When connecting to Elasticsearch Service you need to use your Cloud ID to specify the connection details. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. + +To connect to, stream data to, and issue queries with Elasticsearch Service, you need to think about authentication. Two authentication mechanisms are supported, *API key* and *basic authentication*. Here, to get you started quickly, we’ll show you how to use basic authentication, but you can also generate API keys as shown later on. API keys are safer and preferred for production environments. + + +### Basic authentication [ec_basic_authentication_2] + +For basic authentication, use the same deployment credentials (`username` and `password` parameters) and Cloud ID you copied down earlier. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. (If you did not save the password, you can [reset the password](../../../deploy-manage/users-roles/cluster-or-deployment-auth/built-in-users.md) .) + +You first need to create and edit an `example.ini` file with your deployment details: + +```sh +[ELASTIC] +cloud_id = DEPLOYMENT_NAME:CLOUD_ID_DETAILS +user = elastic +password = LONGPASSWORD +``` + +The following examples are to be typed into the Python interpreter in interactive mode. The prompts have been removed to make it easier for you to copy the samples, the output from the interpreter is shown unmodified. + + +### Import libraries and read in the configuration [ec_import_libraries_and_read_in_the_configuration] + +```python +❯ python3 +Python 3.9.6 (default, Jun 29 2021, 05:25:02) +[Clang 12.0.5 (clang-1205.0.22.9)] on darwin +Type "help", "copyright", "credits" or "license" for more information. + +from elasticsearch import Elasticsearch, helpers +import configparser + +config = configparser.ConfigParser() +config.read('example.ini') +``` + + +#### Output [ec_output] + +```python +['example.ini'] +>>> +``` + + +### Instantiate the {{es}} connection [ec_instantiate_the_es_connection] + +```python +es = Elasticsearch( + cloud_id=config['ELASTIC']['cloud_id'], + http_auth=(config['ELASTIC']['user'], config['ELASTIC']['password']) +) +``` + +You can now confirm that you have connected to the deployment by returning some information about the deployment: + +```python +es.info() +``` + + +#### Output [ec_output_2] + +```python +{'name': 'instance-0000000000', + 'cluster_name': '747ab208fb70403dbe3155af102aef56', + 'cluster_uuid': 'IpgjkPkVQ5efJY-M9ilG7g', + 'version': {'number': '7.15.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '79d65f6e357953a5b3cbcc5e2c7c21073d89aa29', 'build_date': '2021-09-16T03:05:29.143308416Z', 'build_snapshot': False, 'lucene_version': '8.9.0', 'minimum_wire_compatibility_version': '6.8.0', 'minimum_index_compatibility_version': '6.0.0-beta1'}, + 'tagline': 'You Know, for Search'} +``` + + +## Ingest data [ec_ingest_data_2] + +After connecting to your deployment, you are ready to index and search data. Let’s create a new index, insert some quotes from our favorite characters, and then refresh the index so that it is ready to be searched. A refresh makes all operations performed on an index since the last refresh available for search. + + +### Index a document [ec_index_a_document] + +```python +es.index( + index='lord-of-the-rings', + document={ + 'character': 'Aragon', + 'quote': 'It is not this day.' + }) +``` + + +#### Output [ec_output_3] + +```python +{'_index': 'lord-of-the-rings', + '_type': '_doc', + '_id': 'IanWEnwBg_mH2XweqDqg', + '_version': 1, + 'result': 'created', + '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 34, '_primary_term': 1} +``` + + +### Index another record [ec_index_another_record] + +```python +es.index( + index='lord-of-the-rings', + document={ + 'character': 'Gandalf', + 'quote': 'A wizard is never late, nor is he early.' + }) +``` + + +#### Output [ec_output_4] + +```python +{'_index': 'lord-of-the-rings', + '_type': '_doc', + '_id': 'IqnWEnwBg_mH2Xwezjpj', + '_version': 1, + 'result': 'created', + '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 35, '_primary_term': 1} +``` + + +### Index a third record [ec_index_a_third_record] + +```python +es.index( + index='lord-of-the-rings', + document={ + 'character': 'Frodo Baggins', + 'quote': 'You are late' + }) +``` + + +#### Output [ec_output_5] + +```python +{'_index': 'lord-of-the-rings', + '_type': '_doc', + '_id': 'I6nWEnwBg_mH2Xwe_Tre', + '_version': 1, + 'result': 'created', + '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 36, '_primary_term': 1} +``` + + +### Refresh the index [ec_refresh_the_index] + +```python +es.indices.refresh(index='lord-of-the-rings') +``` + + +#### Output [ec_output_6] + +```python +{'_shards': {'total': 2, 'successful': 1, 'failed': 0}} +``` + +When using the `es.index` API, the request automatically creates the `lord-of-the-rings` index, if it doesn’t exist already, as well as document IDs for each indexed document if they are not explicitly specified. + + +## Search and modify data [ec_search_and_modify_data_2] + +After creating a new index and ingesting some data, you are now ready to search. Let’s find what different characters have said things about being `late`: + +```python +result = es.search( + index='lord-of-the-rings', + query={ + 'match': {'quote': 'late'} + } + ) + +result['hits']['hits'] +``` + + +### Output [ec_output_7] + +```python +[{'_index': 'lord-of-the-rings', + '_type': '_doc', + '_id': '2EkAzngB_pyHD3p65UMt', + '_score': 0.5820575, + '_source': {'character': 'Frodo Baggins', 'quote': 'You are late'}}, + {'_index': 'lord-of-the-rings', + '_type': '_doc', + '_id': '10kAzngB_pyHD3p65EPR', + '_score': 0.37883914, + '_source': {'character': 'Gandalf', + 'quote': 'A wizard is never late, nor is he early.'}}] +``` + +The search request returns content of documents containing `late` in the quote field, including document IDs that were automatically generated. + +You can make updates to specific documents using document IDs. Let’s add a birthplace for our character: + +```python +es.update( + index='lord-of-the-rings', + id='2EkAzngB_pyHD3p65UMt', <1> + doc={'birthplace': 'The Shire'} + ) +``` + +1. This update example uses the field `id` to identify the document to update. Copy the `id` from the document related to `Frodo Baggins` when you update and add the `birthplace`. + + + +### Output [ec_output_8] + +```python +es.get(index='lord-of-the-rings', id='2EkAzngB_pyHD3p65UMt') +{'_index': 'lord-of-the-rings', + '_type': '_doc', + '_id': '2EkAzngB_pyHD3p65UMt', + '_version': 2, + '_seq_no': 3, + '_primary_term': 1, + 'found': True, + '_source': {'character': 'Frodo Baggins', + 'quote': 'You are late', + 'birthplace': 'The Shire'}} +``` + +For frequently used API calls with the Python client, check [Examples](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/examples.html). + + +## Switch to API key authentication [ec_switch_to_api_key_authentication_2] + +To get started, authentication to Elasticsearch used the `elastic` superuser and password, but an API key is much safer and a best practice for production. + +In the example that follows, an API key is created with the cluster `monitor` privilege which gives read-only access for determining the cluster state. Some additional privileges also allow `create_index`, `write`, `read`, and `manage` operations for the specified index. The index `manage` privilege is added to enable index refreshes. + +The easiest way to create this key is in the API console for your deployment. Select the deployment name and go to **☰** > **Management** > **Dev Tools**: + +```json +POST /_security/api_key +{ + "name": "python_example", + "role_descriptors": { + "python_read_write": { + "cluster": ["monitor"], + "index": [ + { + "names": ["test-index"], + "privileges": ["create_index", "write", "read", "manage"] + } + ] + } + } +} +``` + + +### The output is: [ec_the_output_is] + +```json +{ + "id" : "API_KEY_ID", + "name" : "python_example", + "api_key" : "API_KEY_DETAILS" +} +``` + +Edit the `example.ini` file you created earlier and add the `id` and `api_key` you just created. You should also remove the lines for `user` and `password` you added earlier after you have tested the `api_key`, and consider changing the `elastic` password using the [Elasticsearch Service Console](https://cloud.elastic.co?page=docs&placement=docs-body). + +```sh +[DEFAULT] +cloud_id = DEPLOYMENT_NAME:CLOUD_ID_DETAILS +apikey_id = API_KEY_ID +apikey_key = API_KEY_DETAILS +``` + +You can now use the API key in place of a username and password. The client connection becomes: + +```python +es = Elasticsearch( + cloud_id=config['DEFAULT']['cloud_id'], + api_key=(config['DEFAULT']['apikey_id'], config['DEFAULT']['apikey_key']), +) +``` + +Check [Create API key API](https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html) to learn more about API Keys and [Security privileges](../../../deploy-manage/users-roles/cluster-or-deployment-auth/elasticsearch-privileges.md) to understand which privileges are needed. If you are not sure what the right combination of privileges for your custom application is, you can enable [audit logging](../../../deploy-manage/monitor/logging-configuration/enabling-elasticsearch-audit-logs.md) on {{es}} to find out what privileges are being used. To learn more about how logging works on Elasticsearch Service, check [Monitoring Elastic Cloud deployment logs and metrics](https://www.elastic.co/blog/monitoring-elastic-cloud-deployment-logs-and-metrics). + +For more information on refreshing an index, searching, updating, and deleting, check the [elasticsearch-py examples](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/examples.html). + + +### Best practices [ec_best_practices_2] + +Security +: When connecting to Elasticsearch Service, the client automatically enables both request and response compression by default, since it yields significant throughput improvements. Moreover, the client also sets the SSL option `secureProtocol` to `TLSv1_2_method` unless specified otherwise. You can still override this option by configuring it. + + Do not enable sniffing when using Elasticsearch Service, since the nodes are behind a load balancer. Elasticsearch Service takes care of everything for you. Take a look at [Elasticsearch sniffing best practices: What, when, why, how](https://www.elastic.co/blog/elasticsearch-sniffing-best-practices-what-when-why-how) if you want to know more. + + +Schema +: When the example code is run, an index mapping is created automatically. The field types are selected by {{es}} based on the content seen when the first record was ingested, and updated as new fields appeared in the data. It would be more efficient to specify the fields and field types in advance to optimize performance. Refer to the Elastic Common Schema documentation and Field Type documentation when you design the schema for your production use cases. + +Ingest +: For more advanced scenarios, [Bulk helpers](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/client-helpers.html#bulk-helpers) gives examples for the `bulk` API that makes it possible to perform multiple operations in a single call. If you have a lot of documents to index, using bulk to batch document operations is significantly faster than submitting requests individually. -% - [ ] ./raw-migrated-files/cloud/cloud/ec-getting-started-python.md -% - [ ] ./raw-migrated-files/cloud/cloud-enterprise/ece-getting-started-python.md \ No newline at end of file diff --git a/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-nodejs-web-application-using-filebeat.md b/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-nodejs-web-application-using-filebeat.md index f340d4bb66..d59d711ce7 100644 --- a/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-nodejs-web-application-using-filebeat.md +++ b/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-nodejs-web-application-using-filebeat.md @@ -47,4 +47,525 @@ $$$ece-node-logs-prerequisites$$$ $$$ece-node-logs-send-ess$$$ -$$$ece-node-logs-view-kibana$$$ \ No newline at end of file +$$$ece-node-logs-view-kibana$$$ + +This guide demonstrates how to ingest logs from a Node.js web application and deliver them securely into an Elasticsearch Service deployment. You’ll set up Filebeat to monitor a JSON-structured log file that has standard Elastic Common Schema (ECS) formatted fields, and you’ll then view real-time visualizations of the log events in Kibana as requests are made to the Node.js server. While Node.js is used for this example, this approach to monitoring log output is applicable across many client types. Check the list of [available ECS logging plugins](https://www.elastic.co/guide/en/ecs-logging/overview/{{ecs-logging}}/intro.html#_get_started). + +*Time required: 1.5 hours* + + +## Prerequisites [ec-node-logs-prerequisites] + +To complete these steps you need the following applications installed on your system: + +* [Node.js](https://nodejs.org/) - You will set up a simple Node.js web server and client application. Check the Node.js download page for installation instructions. + +::::{tip} +For the three following packages, you can create a working directory to install the packages using the Node package manager (NPM). Then, you can run your Node.js webserver and client from the same directory so that it can use the packages. Alternatively, you can also install the Node packages globally by running the Node package install commands with the `-g` option. Refer to the NPM [package installation instructions](https://docs.npmjs.com/downloading-and-installing-packages-globally) for details. +:::: + + +* [winston](https://www.npmjs.com/package/winston) - This is a popular logging package for Node.js. Create a new, local directory and run the following command to install winston in it: + + ```sh + npm install winston + ``` + +* The [Elastic Common Schema (ECS) formatter](https://www.elastic.co/guide/en/ecs-logging/nodejs/{{ecs-logging-nodejs}}/winston.html) for the Node.js winston logger - This plugin formats your Node.js logs into an ECS structured JSON format ideally suited for ingestion into Elasticsearch. To install the ECS winston logger, run the following command in your working directory so that the package is installed in the same location as the winston package: + + ```sh + npm install @elastic/ecs-winston-format + ``` + +* [Got](https://www.npmjs.com/package/got) - Got is a "Human-friendly and powerful HTTP request library for Node.js." - This plugin can be used to query the sample web server used in the tutorial. To install the Got package, run the following command in your working directory: + + ```sh + npm install got + ``` + + +## Create a deployment [ec-node-logs-trial] + +::::{tab-set} + +:::{tab-item} Elastic Cloud Hosted +1. [Get a free trial](https://cloud.elastic.co/registration?page=docs&placement=docs-body). +2. Log into [Elastic Cloud](https://cloud.elastic.co?page=docs&placement=docs-body). +3. Select **Create deployment**. +4. Give your deployment a name. You can leave all other settings at their default values. +5. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +6. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. + +Prefer not to subscribe to yet another service? You can also get Elasticsearch Service through [AWS, Azure, and GCP marketplaces](../../../deploy-manage/deploy/elastic-cloud/subscribe-from-marketplace.md). +::: + +:::{tab-item} Elastic Cloud Enterprise +1. Log into the Elastic Cloud Enterprise admin console. +2. Select **Create deployment**. +3. Give your deployment a name. You can leave all other settings at their default values. +4. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +5. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. +::: + +:::: + +## Connect securely [ec-node-logs-connect-securely] + +When connecting to Elasticsearch Service you can use a Cloud ID to specify the connection details. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. + +To connect to, stream data to, and issue queries with Elasticsearch Service, you need to think about authentication. Two authentication mechanisms are supported, *API key* and *basic authentication*. Here, to get you started quickly, we’ll show you how to use basic authentication, but you can also generate API keys as shown later on. API keys are safer and preferred for production environments. + + +## Create a Node.js web application with logging [ec-node-logs-create-server-script] + +Next, create a basic Node.js script that runs a web server and logs HTTP requests. + +1. In the same local directory where you installed the winston and ECS formatter packages, create a new file *webserver.js* and save it with these contents: + + ```javascript + const http = require('http') + const winston = require('winston') + const ecsFormat = require('@elastic/ecs-winston-format') + + const logger = winston.createLogger({ + level: 'debug', + format: ecsFormat({ convertReqRes: true }), + transports: [ + //new winston.transports.Console(), + new winston.transports.File({ + //path to log file + filename: 'logs/log.json', + level: 'debug' + }) + ] + }) + + const server = http.createServer(handler) + server.listen(3000, () => { + logger.info('listening at http://localhost:3000') + }) + + function handler (req, res) { + res.setHeader('Foo', 'Bar') + res.end('ok') + logger.info('handled request', { req, res }) + } + ``` + + This Node.js script runs a web server at `http://localhost:3000` and uses the winston logger to send logging events, based on HTTP requests, to the file `log.json`. + +2. Try a test run of the Node.js script: + + ```sh + node webserver.js + ``` + +3. With the script running, open a web browser to `http://localhost:3000` and there should be a simple `ok` message. +4. In the directory where you created `webserver.js`, you should now find a newly created `log.json` file. Open the file and check the contents. There should be one log entry indicating that Node.js is listening on the localhost port, and another entry for the HTTP request from when you opened `localhost` in your browser. + + Leave `webserver.js` running for now and we’ll send it some HTTP requests. + + + +## Create a Node.js HTTP request application [ec-node-logs-create-request-script] + +In this step, you’ll create a Node.js application that sends HTTP requests to your web server. + +1. In your working directory, create a file `webrequests.js` and save it with these contents: + + ```javascript + const got = require('got'); + + const addresses = [ + 'aardvark@the.zoo', + 'crocodile@the.zoo', + 'elephant@the.zoo', + 'emu@the.zoo', + 'hippopotamus@the.zoo', + 'llama@the.zoo', + 'octopus@the.zoo', + 'otter@the.zoo', + 'panda@the.zoo', + 'pangolin@the.zoo', + 'tortoise@the.zoo', + 'walrus@the.zoo' + ]; + + const method = [ + 'get', + 'put', + 'post' + ]; + + async function sleep(millis) { + return new Promise(resolve => setTimeout(resolve, millis)); + } + + (async () => { + while (true) { + var type = Math.floor(Math.random() * method.length); + var email = Math.floor(Math.random() * addresses.length); + var sleeping = Math.floor(Math.random() * 9) + 1; + + switch (method[type]) { + case 'get': + try { + const response = await got.get('http://localhost:3000/', { + headers: { + from: addresses[email] + } + }).json(); + console.log(response.body); + } catch (error) { + console.log(error.response.body); + } + break; // end case 'get' + case 'put': + try { + const response = await got.put('http://localhost:3000/', { + headers: { + from: addresses[email] + } + }).json(); + console.log(response.body); + } catch (error) { + console.log(error.response.body); + } + break; // end case 'put' + case 'post': + try { + const { + data + } = await got.post('http://localhost:3000/', { + headers: { + from: addresses[email] + } + }).json(); + console.log(data); + } catch (error) { + console.log(error.response.body); + } + break; // end case 'post' + } // end switch on method + await sleep(sleeping * 1000); + } + })(); + ``` + + This Node.js app generates HTTP requests with a random method of type `GET`, `POST`, or `PUT`, and a random `from` request header using various pretend email addresses. The requests are sent at random intervals between 1 and 10 seconds. + + The [Got package](https://www.npmjs.com/package/got) is used to send the requests, and they are directed to your web server at `http://localhost:3000`. To learn about sending custom headers such as the `from` field used in this example, check [headers](https://github.com/sindresorhus/got/blob/0fb6ec60d299fd9b48966608a4c3f201746d821c/documentation/2-options.md#headers) in the Got documentation. + +2. In a new terminal window, give the Node.js script a trial run: + + ```sh + node webrequests.js + ``` + +3. After the script has run for about 30 seconds, enter *CTRL + C* to stop it. Have a look at your Node.js `logs/log.json` file. It should contain some entries like this one: + + ```json + {"@timestamp":"2021-09-09T18:42:20.799Z","log.level":"info","message":"handled request","ecs":{"version":"1.6.0"},"http":{"version":"1.1","request":{"method":"POST","headers":{"user-agent":"got (https://github.com/sindresorhus/got)","from":"octopus@the.zoo","accept":"application/json","accept-encoding":"gzip, deflate, br","host":"localhost:3000","connection":"close","content-length":"0"},"body":{"bytes":0}},"response":{"status_code":200,"headers":{"foo":"Bar"}}},"url":{"path":"/","full":"http://localhost:3000/"},"client":{"address":"::ffff:127.0.0.1","ip":"::ffff:127.0.0.1","port":49930},"user_agent":{"original":"got (https://github.com/sindresorhus/got)"}} + ``` + + Each log entry contains details of the HTTP request. In particular, in this example you can find the timestamp of the request, a request method of type `PUT`, and a request `from` header with the email address `octopus@the.zoo`. Your example will likely be a bit different since the request type and the email address are generated randomly. + + Having your logs written in a JSON format with ECS fields allows for easy parsing and analysis, and for standardization with other applications. A standard, easily parsible format becomes increasingly important as the volume and type of data captured in your logs expands over time. + +4. After confirming that both `webserver.js` and `webrequests.js` run as expected, enter *CTRL + C* to stop the Node.js script, and also delete `log.json`. + + +## Set up Filebeat [ec-node-logs-filebeat] + +Filebeat offers a straightforward, easy to configure way to monitor your Node.js log files and port the log data into Elasticsearch Service. + +**Get Filebeat** + +[Download Filebeat](https://www.elastic.co/downloads/beats/filebeat) and unpack it on the local server from which you want to collect data. + +**Configure Filebeat to access Elasticsearch Service** + +In */filebeat-/* (where ** is the directory where Filebeat is installed and ** is the Filebeat version number), open the *filebeat.yml* configuration file for editing. + +```txt +# =============================== Elastic Cloud ================================ + +# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/). + +# The cloud.id setting overwrites the `output.elasticsearch.hosts` and +# `setup.kibana.host` options. +# You can find the `cloud.id` in the Elastic Cloud web UI. +cloud.id: my-deployment:yTMtd5VzdKEuP2NwPbNsb3VkLtKzLmldJDcyMzUyNjBhZGP7MjQ4OTZiNTIxZTQyOPY2C2NeOGQwJGQ2YWQ4M5FhNjIyYjQ9ODZhYWNjKDdlX2Yz4ELhRYJ7 <1> + +# The cloud.auth setting overwrites the `output.elasticsearch.username` and +# `output.elasticsearch.password` settings. The format is `:`. +cloud.auth: elastic:591KhtuAgTP46by9C4EmhGuk <2> +``` + +1. Uncomment the `cloud.id` line and add the deployment’s Cloud ID. You can include or omit the *:* prefix at the beginning of the Cloud ID. Both versions work fine. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. +2. Uncomment the `cloud.auth` line and add the username and password for your deployment that you recorded when you created your deployment. The format is *:*, for example *elastic:57ugj782kvkwmSKg8uVe*. + + +**Configure Filebeat inputs** + +Filebeat has several ways to collect logs. For this example, you’ll configure log collection manually. + +In the *filebeat.inputs* section of *filebeat.yml*, set *enabled:* to *true*, and set *paths:* to the location of your web server log file. In this example, set the same directory where you saved *webserver.js*: + +```txt +filebeat.inputs: + +# Each - is an input. Most options can be set at the input level, so +# you can use different inputs for various configurations. +# Below are the input specific configurations. + +- type: log + + # Change to true to enable this input configuration. + enabled: true + + # Paths that should be crawled and fetched. Glob based paths. + paths: + - /path/to/logs/log.json +``` + +::::{tip} +You can specify a wildcard (***) character to indicate that all log files in the specified directory should be read. You can also use a wildcard to read logs from multiple directories. For example `/var/log/*/*.log`. +:::: + + +**Add the JSON input options** + +Filebeat’s input configuration options include several settings for decoding JSON messages. Log files are decoded line by line, so it’s important that they contain one JSON object per line. + +For this example, Filebeat uses the following four decoding options. + +```txt + json.keys_under_root: true + json.overwrite_keys: true + json.add_error_key: true + json.expand_keys: true +``` + +To learn more about these settings, check [JSON input configuration options](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-config-json) and [Decode JSON fields](https://www.elastic.co/guide/en/beats/filebeat/current/decode-json-fields.html) in the Filebeat Reference. + +Append the four JSON decoding options to the *Filebeat inputs* section of *filebeat.yml*, so that the section now looks like this: + +```yaml +# ============================== Filebeat inputs =============================== + +filebeat.inputs: + +# Each - is an input. Most options can be set at the input level, so +# you can use different inputs for various configurations. +# Below are the input specific configurations. + +- type: log + + # Change to true to enable this input configuration. + enabled: true + + # Paths that should be crawled and fetched. Glob based paths. + paths: + - /path/to/logs/log.json + json.keys_under_root: true + json.overwrite_keys: true + json.add_error_key: true + json.expand_keys: true +``` + +**Finish setting up Filebeat** + +Filebeat comes with predefined assets for parsing, indexing, and visualizing your data. To load these assets, run the following from the Filebeat installation directory: + +```txt +./filebeat setup -e +``` + +::::{important} +Depending on variables including the installation location, environment, and local permissions, you might need to [change the ownership](https://www.elastic.co/guide/en/beats/libbeat/current/config-file-permissions.html) of filebeat.yml. You can also try running the command as *root*: *sudo ./filebeat setup -e* or you can disable strict permission checks by running the command with the `--strict.perms=false` option. +:::: + + +The setup process takes a couple of minutes. If everything goes successfully you should get a confirmation message: + +```txt +Loaded Ingest pipelines +``` + +The Filebeat data view is now available in Elasticsearch. To verify: + +1. [Login to Kibana](../../../deploy-manage/deploy/elastic-cloud/access-kibana.md). +2. Open the {{kib}} main menu and select **Management** > **{{kib}}** > **Data views**. +3. In the search bar, search for *filebeat*. You should get *filebeat-** in the search results. + +**Optional: Use an API key to authenticate** + +For additional security, instead of using basic authentication you can generate an Elasticsearch API key through the Elasticsearch Service console, and then configure Filebeat to use the new key to connect securely to the Elasticsearch Service deployment. + +1. Log in to the [Elasticsearch Service Console](https://cloud.elastic.co?page=docs&placement=docs-body). +2. Select the deployment name and go to **☰** > **Management** > **Dev Tools**. +3. Enter the following request: + + ```json + POST /_security/api_key + { + "name": "filebeat-api-key", + "role_descriptors": { + "logstash_read_write": { + "cluster": ["manage_index_templates", "monitor"], + "index": [ + { + "names": ["filebeat-*"], + "privileges": ["create_index", "write", "read", "manage"] + } + ] + } + } + } + ``` + + This creates an API key with the cluster `monitor` privilege which gives read-only access for determining the cluster state, and `manage_index_templates` which allows all operations on index templates. Some additional privileges also allow `create_index`, `write`, and `manage` operations for the specified index. The index `manage` privilege is added to enable index refreshes. + +4. Click **▶**. The output should be similar to the following: + + ```json + { + "api_key": "tV1dnfF-GHI59ykgv4N0U3", + "id": "2TBR42gBabmINotmvZjv", + "name": "filebeat-api-key" + } + ``` + +5. Add your API key information to the *Elasticsearch Output* section of *filebeat.yml*, just below *output.elasticsearch:*. Use the format `:`. If your results are as shown in this example, enter `2TBR42gBabmINotmvZjv:tV1dnfF-GHI59ykgv4N0U3`. +6. Add a pound (`#`) sign to comment out the *cloud.auth: elastic:* line, since Filebeat will use the API key instead of the deployment username and password to authenticate. + + ```txt + # =============================== Elastic Cloud ================================ + + # These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/). + + # The cloud.id setting overwrites the `output.elasticsearch.hosts` and + # `setup.kibana.host` options. + # You can find the `cloud.id` in the Elastic Cloud web UI. + cloud.id: my-deployment:yTMtd5VzdKEuP2NwPbNsb3VkLtKzLmldJDcyMzUyNjBhZGP7MjQ4OTZiNTIxZTQyOPY2C2NeOGQwJGQ2YWQ4M5FhNjIyYjQ9ODZhYWNjKDdlX2Yz4ELhRYJ7 + + # The cloud.auth setting overwrites the `output.elasticsearch.username` and + # `output.elasticsearch.password` settings. The format is `:`. + #cloud.auth: elastic:591KhtuAgTP46by9C4EmhGuk + + # ================================== Outputs =================================== + + # Configure what output to use when sending the data collected by the beat. + + # ---------------------------- Elasticsearch Output ---------------------------- + output.elasticsearch: + # Array of hosts to connect to. + api_key: "2TBR42gBabmINotmvZjv:tV1dnfF-GHI59ykgv4N0U3" + ``` + + + +## Send the Node.js logs to Elasticsearch [ec-node-logs-send-ess] + +It’s time to send some log data into {{es}}! + +**Launch Filebeat and webserver.js** + +Launch Filebeat by running the following from the Filebeat installation directory: + +```txt +./filebeat -e -c filebeat.yml +``` + +In this command: + +* The *-e* flag sends output to the standard error instead of the configured log output. +* The *-c* flag specifies the path to the Filebeat config file. + +::::{note} +Just in case the command doesn’t work as expected, check the [Filebeat quick start](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-installation-configuration.html#start) for the detailed command syntax for your operating system. You can also try running the command as *root*: *sudo ./filebeat -e -c filebeat.yml*. +:::: + + +Filebeat should now be running and monitoring the contents of *log.json*, which actually doesn’t exist yet. So, let’s create it. Open a new terminal instance and run the *webserver.js* Node.js script: + +```sh +node webserver.js +``` + +Next, run the Node.js `webrequests.js` script to send random requests to the Node.js web server. + +```sh +node webrequests.js +``` + +Let the script run for a few minutes and maybe brew up a quick coffee or tea ☕ . After that, make sure that the *log.json* file is generated as expected and is populated with several log entries. + +**Verify the log entries in Elasticsearch Service** + +The next step is to confirm that the log data has successfully found it’s way into Elasticsearch Service. + +1. [Login to Kibana](../../../deploy-manage/deploy/elastic-cloud/access-kibana.md). +2. Open the {{kib}} main menu and select **Management** > **{{kib}}** > **Data views**. +3. In the search bar, search for *filebeat*. You should get *filebeat-** in the search results. +4. Select *filebeat-**. + +The filebeat data view shows a list of fields and their details. + + +## Create log visualizations in Kibana [ec-node-logs-view-kibana] + +Now it’s time to create visualizations based off of the application log data. + +1. Open the Kibana main menu and select **Dashboard**, then **Create dashboard**. +2. Select **Create visualization**. The [Lens](../../../explore-analyze/visualize/lens.md) visualization editor opens. +3. In the data view dropdown box, select **filebeat-***, if it isn’t already selected. +4. In the **CHART TYPE** dropdown box, select **Bar vertical stacked**, if it isn’t already selected. +5. Check that the [time filter](../../../explore-analyze/query-filter/filtering.md) is set to **Last 15 minutes**. +6. From the **Available fields** list, drag and drop the **@timestamp** field onto the visualization builder. +7. Drag and drop the **http.request.method** field onto the visualization builder. +8. A stacked bar chart now shows the relative frequency of each of the three request methods used in our example, measured over time. + + ![A screen capture of the Kibana "Bar vertical stacked" visualization with several bars. The X axis shows "Count of records" and the Y axis shows "@timestamp per 30 seconds". Each bar is divided into three HTTP request methods: GET](../../../images/cloud-ec-node-logs-methods.png "") + +9. Select **Save and return** to add this visualization to your dashboard. + +Let’s create a second visualization. + +1. Select **Create visualization**. +2. Again, make sure that **CHART TYPE** is set to **Bar vertical stacked**. +3. From the **Available fields** list, drag and drop the **@timestamp** field onto the visualization builder. +4. Drag and drop the **http.request.headers.from** field onto the visualization builder. +5. In the chart settings area, under **Break down by**, select **Top values of http.request.headers.from** and set **Number of values** to *12*. In this example there are twelve different email addresses used in the HTTP *from* header, so this parameter sets all of them to appear in the chart legend. +6. Select **Refresh**. A stacked bar chart now shows the relative frequency of each of the HTTP *from* headers over time. + + ![A screen capture of the visualization builder](../../../images/cloud-ec-node-logs-content.png "") + +7. Select **Save and return** to add this visualization to your dashboard. + +And now for the final visualization. + +1. Select **Create visualization**. +2. In the **CHART TYPE** dropdown box, select **Donut**. +3. From the list of available fields, drag and drop the **http.request.method** field onto the visualization builder. A donut chart appears. + + ![A screen capture of a donut chart divided into three sections](../../../images/cloud-ec-node-logs-donut.png "") + +4. Select **Save and return** to add this visualization to your dashboard. +5. Select **Save** and add a title to save your new dashboard. + +You now have a Kibana dashboard with three visualizations: a stacked bar chart showing the frequency of each HTTP request method over time, another stacked bar chart showing the frequency of various HTTP *from* headers over time, and a donut chart showing the relative frequency of each HTTP request method type. + +You can add titles to the visualizations, resize and position them as you like, and then save your changes. + +**View log data updates in real time** + +1. Select **Refresh** on the Kibana dashboard. Since the application `webrequests.js` continues to run and send HTTP requests to the Node.js server, `webserver.js` continues to generate log data, and your Kibana visualizations update with that data with each page refresh. + + ![A screen capture of the completed Kibana dashboard](../../../images/cloud-ec-node-logs-final-dashboard.png "") + +2. As your final step, remember to stop Filebeat, the Node.js web server, and the client. Enter *CTRL + C* in the terminal window for each application to stop them. + +You now know how to monitor log files from a Node.js web application, deliver the log event data securely into an Elasticsearch Service deployment, and then visualize the results in Kibana in real time. Consult the [Filebeat documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) to learn more about the ingestion and processing options available for your data. You can also explore our [documentation](../../../manage-data/ingest.md#ec-ingest-methods) to learn all about working in Elasticsearch Service. + diff --git a/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-python-application-using-filebeat.md b/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-python-application-using-filebeat.md index d2edf9c2ef..1671f91e74 100644 --- a/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-python-application-using-filebeat.md +++ b/manage-data/ingest/ingesting-data-from-applications/ingest-logs-from-python-application-using-filebeat.md @@ -31,4 +31,420 @@ $$$ece-python-logs-filebeat$$$ $$$ece-python-logs-send-ess$$$ -$$$ece-python-logs-view-kibana$$$ \ No newline at end of file +$$$ece-python-logs-view-kibana$$$ + +This guide demonstrates how to ingest logs from a Python application and deliver them securely into an Elasticsearch Service deployment. You’ll set up Filebeat to monitor a JSON-structured log file that has standard Elastic Common Schema (ECS) formatted fields, and you’ll then view real-time visualizations of the log events in {{kib}} as they occur. While Python is used for this example, this approach to monitoring log output is applicable across many client types. Check the list of [available ECS logging plugins](https://www.elastic.co/guide/en/ecs-logging/overview/{{ecs-logging}}/intro.html). + +*Time required: 1 hour* + +## Prerequisites [ec_prerequisites_2] + +To complete these steps you need to have [Python](https://www.python.org/) installed on your system as well as the [Elastic Common Schema (ECS) logger](https://www.elastic.co/guide/en/ecs-logging/python/{{ecs-logging-python}}/installation.html) for the Python logging library. + +To install *ecs-logging-python*, run: + +```sh +python -m pip install ecs-logging +``` + + +## Create a deployment [ec_get_elasticsearch_service_3] + +::::{tab-set} + +:::{tab-item} Elastic Cloud Hosted +1. [Get a free trial](https://cloud.elastic.co/registration?page=docs&placement=docs-body). +2. Log into [Elastic Cloud](https://cloud.elastic.co?page=docs&placement=docs-body). +3. Select **Create deployment**. +4. Give your deployment a name. You can leave all other settings at their default values. +5. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +6. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. + +Prefer not to subscribe to yet another service? You can also get Elasticsearch Service through [AWS, Azure, and GCP marketplaces](../../../deploy-manage/deploy/elastic-cloud/subscribe-from-marketplace.md). +::: + +:::{tab-item} Elastic Cloud Enterprise +1. Log into the Elastic Cloud Enterprise admin console. +2. Select **Create deployment**. +3. Give your deployment a name. You can leave all other settings at their default values. +4. Select **Create deployment** and save your Elastic deployment credentials. You need these credentials later on. +5. When the deployment is ready, click **Continue** and a page of **Setup guides** is displayed. To continue to the deployment homepage click **I’d like to do something else**. +::: + +:::: + +## Connect securely [ec_connect_securely_2] + +When connecting to Elasticsearch Service you can use a Cloud ID to specify the connection details. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. + +To connect to, stream data to, and issue queries with Elasticsearch Service, you need to think about authentication. Two authentication mechanisms are supported, *API key* and *basic authentication*. Here, to get you started quickly, we’ll show you how to use basic authentication, but you can also generate API keys as shown later on. API keys are safer and preferred for production environments. + + +## Create a Python script with logging [ec-python-logs-create-script] + +In this step, you’ll create a Python script that generates logs in JSON format, using Python’s standard logging module. + +1. In a local directory, create a new file *elvis.py* and save it with these contents: + + ```python + #!/usr/bin/python + + import logging + import ecs_logging + import time + from random import randint + + #logger = logging.getLogger(__name__) + logger = logging.getLogger("app") + logger.setLevel(logging.DEBUG) + handler = logging.FileHandler('elvis.json') + handler.setFormatter(ecs_logging.StdlibFormatter()) + logger.addHandler(handler) + + print("Generating log entries...") + + messages = [ + "Elvis has left the building.",# + "Elvis has left the oven on.", + "Elvis has two left feet.", + "Elvis was left out in the cold.", + "Elvis was left holding the baby.", + "Elvis left the cake out in the rain.", + "Elvis came out of left field.", + "Elvis exited stage left.", + "Elvis took a left turn.", + "Elvis left no stone unturned.", + "Elvis picked up where he left off.", + "Elvis's train has left the station." + ] + + while True: + random1 = randint(0,15) + random2 = randint(1,10) + if random1 > 11: + random1 = 0 + if(random1<=4): + logger.info(messages[random1], extra={"http.request.body.content": messages[random1]}) + elif(random1>=5 and random1<=8): + logger.warning(messages[random1], extra={"http.request.body.content": messages[random1]}) + elif(random1>=9 and random1<=10): + logger.error(messages[random1], extra={"http.request.body.content": messages[random1]}) + else: + logger.critical(messages[random1], extra={"http.request.body.content": messages[random1]}) + time.sleep(random2) + ``` + + This Python script randomly generates one of twelve log messages, continuously, at a random interval of between 1 and 10 seconds. The log messages are written to file `elvis.json`, each with a timestamp, a log level of *info*, *warning*, *error*, or *critical*, and other data. Just to add some variance to the log data, the *info* message *Elvis has left the building* is set to be the most probable log event. + + For simplicity, there is just one log file and it is written to the local directory where `elvis.py` is located. In a production environment you may have multiple log files, associated with different modules and loggers, and likely stored in `/var/log` or similar. To learn more about configuring logging in Python, check [Logging facility for Python](https://docs.python.org/3/library/logging.md). + + Having your logs written in a JSON format with ECS fields allows for easy parsing and analysis, and for standardization with other applications. A standard, easily parsible format becomes increasingly important as the volume and type of data captured in your logs expands over time. + + Together with the standard fields included for each log entry is an extra *http.request.body.content* field. This extra field is there just to give you some additional, interesting data to work with, and also to demonstrate how you can add optional fields to your log data. Check the [ECS Field Reference](https://www.elastic.co/guide/en/ecs/{{ecs_version}}/ecs-field-reference.html) for the full list of available fields. + +2. Let’s give the Python script a test run. Open a terminal instance in the location where you saved *elvis.py* and run the following: + + ```sh + python elvis.py + ``` + + After the script has run for about 15 seconds, enter *CTRL + C* to stop it. Have a look at the newly generated *elvis.json*. It should contain one or more entries like this one: + + ```json + {"@timestamp":"2021-06-16T02:19:34.687Z","log.level":"info","message":"Elvis has left the building.","ecs":{"version":"1.6.0"},"http":{"request":{"body":{"content":"Elvis has left the building."}}},"log":{"logger":"app","origin":{"file":{"line":39,"name":"elvis.py"},"function":""},"original":"Elvis has left the building."},"process":{"name":"MainProcess","pid":3044,"thread":{"id":4444857792,"name":"MainThread"}}} + ``` + +3. After confirming that *elvis.py* runs as expected, you can delete *elvis.json*. + + +## Set up Filebeat [ec-python-logs-filebeat] + +Filebeat offers a straightforward, easy to configure way to monitor your Python log files and port the log data into Elasticsearch Service. + +**Get Filebeat** + +[Download Filebeat](https://www.elastic.co/downloads/beats/filebeat) and unpack it on the local server from which you want to collect data. + +**Configure Filebeat to access Elasticsearch Service** + +In */filebeat-/* (where ** is the directory where Filebeat is installed and ** is the Filebeat version number), open the *filebeat.yml* configuration file for editing. + +```txt +# =============================== Elastic Cloud ================================ + +# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/). + +# The cloud.id setting overwrites the `output.elasticsearch.hosts` and +# `setup.kibana.host` options. +# You can find the `cloud.id` in the Elastic Cloud web UI. +cloud.id: my-deployment:long-hash <1> + +# The cloud.auth setting overwrites the `output.elasticsearch.username` and +# `output.elasticsearch.password` settings. The format is `:`. +cloud.auth: elastic:password <2> +``` + +1. Uncomment the `cloud.id` line and add the deployment’s Cloud ID. You can include or omit the *:* prefix at the beginning of the Cloud ID. Both versions work fine. Find your Cloud ID by going to the {{kib}} main menu and selecting Management > Integrations, and then selecting View deployment details. +2. Uncomment the `cloud.auth` line and add the username and password for your deployment that you recorded when you created your deployment. The format is *:*, for example *elastic:57ugj782kvkwmSKg8uVe*. + + +**Configure Filebeat inputs** + +Filebeat has several ways to collect logs. For this example, you’ll configure log collection manually. + +In the *filebeat.inputs* section of *filebeat.yml*, set *enabled:* to *true*, and set *paths:* to the location of your log file or files. In this example, set the same directory where you saved *elvis.py*: + +```txt +filebeat.inputs: + +# Each - is an input. Most options can be set at the input level, so +# you can use different inputs for various configurations. +# Below are the input specific configurations. + +- type: log + + # Change to true to enable this input configuration. + enabled: true + + # Paths that should be crawled and fetched. Glob based paths. + paths: + - /path/to/log/files/*.json +``` + +You can specify a wildcard (***) character to indicate that all log files in the specified directory should be read. You can also use a wildcard to read logs from multiple directories. For example `/var/log/*/*.log`. + +**Add the JSON input options** + +Filebeat’s input configuration options include several settings for decoding JSON messages. Log files are decoded line by line, so it’s important that they contain one JSON object per line. + +For this example, Filebeat uses the following four decoding options. + +```txt + json.keys_under_root: true + json.overwrite_keys: true + json.add_error_key: true + json.expand_keys: true +``` + +To learn more about these settings, check [JSON input configuration options](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-config-json) and [Decode JSON fields](https://www.elastic.co/guide/en/beats/filebeat/current/decode-json-fields.html) in the Filebeat Reference. + +Append the four JSON decoding options to the *Filebeat inputs* section of *filebeat.yml*, so that the section now looks like this: + +```yaml +# ============================== Filebeat inputs =============================== + +filebeat.inputs: + +# Each - is an input. Most options can be set at the input level, so +# you can use different inputs for various configurations. +# Below are the input specific configurations. + +- type: log + + # Change to true to enable this input configuration. + enabled: true + + # Paths that should be crawled and fetched. Glob based paths. + paths: + - /path/to/log/files/*.json + json.keys_under_root: true + json.overwrite_keys: true + json.add_error_key: true + json.expand_keys: true +``` + +**Finish setting up Filebeat** + +Filebeat comes with predefined assets for parsing, indexing, and visualizing your data. To load these assets, run the following from the Filebeat installation directory: + +```txt +./filebeat setup -e +``` + +::::{important} +Depending on variables including the installation location, environment, and local permissions, you might need to [change the ownership](https://www.elastic.co/guide/en/beats/libbeat/current/config-file-permissions.html) of filebeat.yml. You can also try running the command as *root*: *sudo ./filebeat setup -e* or you can disable strict permission checks by running the command with the `--strict.perms=false` option. +:::: + + +The setup process takes a couple of minutes. If everything goes successfully you should get a confirmation message: + +```txt +Loaded Ingest pipelines +``` + +The Filebeat data view (formerly *index pattern*) is now available in Elasticsearch. To verify: + +::::{note} +Beginning with Elastic Stack version 8.0, Kibana *index patterns* have been renamed to *data views*. To learn more, check the Kibana [What’s new in 8.0](https://www.elastic.co/guide/en/kibana/8.0/whats-new.html#index-pattern-rename) page. +:::: + + +1. [Login to Kibana](../../../deploy-manage/deploy/elastic-cloud/access-kibana.md). +2. Open the {{kib}} main menu and select **Management** > **{{kib}}** > **Data views**. +3. In the search bar, search for *filebeat*. You should get *filebeat-** in the search results. + +**Optional: Use an API key to authenticate** + +For additional security, instead of using basic authentication you can generate an Elasticsearch API key through the [Elasticsearch Service Console](https://cloud.elastic.co?page=docs&placement=docs-body), and then configure Filebeat to use the new key to connect securely to the Elasticsearch Service deployment. + +1. Log in to the [Elasticsearch Service Console](https://cloud.elastic.co?page=docs&placement=docs-body). +2. Select the deployment name and go to **☰** > **Management** > **Dev Tools**. +3. Enter the following request: + + ```json + POST /_security/api_key + { + "name": "filebeat-api-key", + "role_descriptors": { + "logstash_read_write": { + "cluster": ["manage_index_templates", "monitor"], + "index": [ + { + "names": ["filebeat-*"], + "privileges": ["create_index", "write", "read", "manage"] + } + ] + } + } + } + ``` + + This creates an API key with the cluster `monitor` privilege which gives read-only access for determining the cluster state, and `manage_index_templates` which allows all operations on index templates. Some additional privileges also allow `create_index`, `write`, and `manage` operations for the specified index. The index `manage` privilege is added to enable index refreshes. + +4. Click **▶**. The output should be similar to the following: + + ```json + { + "api_key": "tV1dnfF-GHI59ykgv4N0U3", + "id": "2TBR42gBabmINotmvZjv", + "name": "filebeat-api-key" + } + ``` + +5. Add your API key information to the *Elasticsearch Output* section of `filebeat.yml`, just below *output.elasticsearch:*. Use the format `:`. If your results are as shown in this example, enter `2TBR42gBabmINotmvZjv:tV1dnfF-GHI59ykgv4N0U3`. +6. Add a pound (`#`) sign to comment out the *cloud.auth: elastic:* line, since Filebeat will use the API key instead of the deployment username and password to authenticate. + + ```txt + # =============================== Elastic Cloud ================================ + + # These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/). + + # The cloud.id setting overwrites the `output.elasticsearch.hosts` and + # `setup.kibana.host` options. + # You can find the `cloud.id` in the Elastic Cloud web UI. + cloud.id: my-deployment:yTMtd5VzdKEuP2NwPbNsb3VkLtKzLmldJDcyMzUyNjBhZGP7MjQ4OTZiNTIxZTQyOPY2C2NeOGQwJGQ2YWQ4M5FhNjIyYjQ9ODZhYWNjKDdlX2Yz4ELhRYJ7 + + # The cloud.auth setting overwrites the `output.elasticsearch.username` and + # `output.elasticsearch.password` settings. The format is `:`. + #cloud.auth: elastic:591KhtuAgTP46by9C4EmhGuk + + # ================================== Outputs =================================== + + # Configure what output to use when sending the data collected by the beat. + + # ---------------------------- Elasticsearch Output ---------------------------- + output.elasticsearch: + # Array of hosts to connect to. + api_key: "2TBR42gBabmINotmvZjv:tV1dnfF-GHI59ykgv4N0U3" + ``` + + + +## Send the Python logs to Elasticsearch [ec-python-logs-send-ess] + +It’s time to send some log data into E{{es}}! + +**Launch Filebeat and elvis.py** + +Launch Filebeat by running the following from the Filebeat installation directory: + +```txt +./filebeat -e -c filebeat.yml +``` + +In this command: + +* The *-e* flag sends output to the standard error instead of the configured log output. +* The *-c* flag specifies the path to the Filebeat config file. + +::::{note} +Just in case the command doesn’t work as expected, check the [Filebeat quick start](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-installation-configuration.html#start) for the detailed command syntax for your operating system. You can also try running the command as *root*: *sudo ./filebeat -e -c filebeat.yml*. +:::: + + +Filebeat should now be running and monitoring the contents of *elvis.json*, which actually doesn’t exist yet. So, let’s create it. Open a new terminal instance and run the *elvis.py* Python script: + +```sh +python elvis.py +``` + +Let the script run for a few minutes and maybe brew up a quick coffee or tea ☕ . After that, make sure that the *elvis.json* file is generated as expected and is populated with several log entries. + +**Verify the log entries in Elasticsearch Service** + +The next step is to confirm that the log data has successfully found it’s way into Elasticsearch Service. + +1. [Login to Kibana](../../../deploy-manage/deploy/elastic-cloud/access-kibana.md). +2. Open the {{kib}} main menu and select **Management** > **{{kib}}** > **Data views**. +3. In the search bar, search for *filebeat_. You should get *filebeat-** in the search results. +4. Select *filebeat-**. + +The filebeat data view shows a list of fields and their details. + + +## Create log visualizations in Kibana [ec-python-logs-view-kibana] + +Now it’s time to create visualizations based off of the Python application log data. + +1. Open the Kibana main menu and select **Dashboard**, then **Create dashboard**. +2. Select **Create visualization**. The [Lens](../../../explore-analyze/visualize/lens.md) visualization editor opens. +3. In the data view dropdown box, select **filebeat-**, if it isn’t already selected. +4. In the **Visualization type dropdown**, select **Bar vertical stacked**, if it isn’t already selected. +5. Check that the [time filter](../../../explore-analyze/query-filter/filtering.md) is set to **Last 15 minutes**. +6. From the **Available fields** list, drag and drop the **@timestamp** field onto the visualization builder. +7. Drag and drop the *log.level* field onto the visualization builder. +8. In the chart settings area, under **Break down by**, select **Top values of log.level** and set **Number of values** to *4*. Since there are four log severity levels, this parameter sets all of them to appear in the chart legend. +9. Select **Refresh**. A stacked bar chart now shows the relative frequency of each of the four log severity levels over time. + + ![A screen capture of the Kibana "Bar vertical stacked" visualization with several bars. The X axis shows "Count of records" and the Y axis shows "@timestamp per 30 seconds". Each bar is divided into the four log severity levels.](../../../images/cloud-ec-python-logs-levels.png "") + +10. Select **Save and return** to add this visualization to your dashboard. + +Let’s create a second visualization. + +1. Select **Create visualization**. +2. Again, make sure that **Visualization type dropdown** is set to **Bar vertical stacked**. +3. From the **Available fields** list, drag and drop the **@timestamp** field onto the visualization builder. +4. Drag and drop the **http.request.body.content** field onto the visualization builder. +5. In the chart settings area, under **Break down by**, select **Top values of http.request.body.content** and set **Number of values** to *12*. Since there are twelve different log messages, this parameter sets all of them to appear in the chart legend. +6. Select **Refresh**. A stacked bar chart now shows the relative frequency of each of the log messages over time. + + ![A screen capture of the visualization builder](../../../images/cloud-ec-python-logs-content.png "") + +7. Select **Save and return** to add this visualization to your dashboard. + +And now for the final visualization. + +1. Select **Create visualization**. +2. In the **Visualization type dropdown** dropdown, select **Donut**. +3. From the list of available fields, drag and drop the **log.level** field onto the visualization builder. A donut chart appears. + + ![A screen capture of a donut chart divided into four sections](../../../images/cloud-ec-python-logs-donut.png "") + +4. Select **Save and return** to add this visualization to your dashboard. +5. Select **Save** and add a title to save your new dashboard. + +You now have a Kibana dashboard with three visualizations: a stacked bar chart showing the frequency of each log severity level over time, another stacked bar chart showing the frequency of various message strings over time (from the added *http.request.body.content* parameter), and a donut chart showing the relative frequency of each log severity type. + +You can add titles to the visualizations, resize and position them as you like, and then save your changes. + +**View log data updates in real time** + +1. Select **Refresh** on the Kibana dashboard. Since *elvis.py* continues to run and generate log data, your Kibana visualizations update with each refresh. + + ![A screen capture of the completed Kibana dashboard](../../../images/cloud-ec-python-logs-final-dashboard.png "") + +2. As your final step, remember to stop Filebeat and the Python script. Enter *CTRL + C* in both your Filebeat terminal and in your `elvis.py` terminal. + +You now know how to monitor log files from a Python application, deliver the log event data securely into an Elasticsearch Service deployment, and then visualize the results in Kibana in real time. Consult the [Filebeat documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) to learn more about the ingestion and processing options available for your data. You can also explore our [documentation](../../../manage-data/ingest.md#ec-ingest-methods) to learn all about working in Elasticsearch Service. +