Skip to content

Latest commit

 

History

History
253 lines (199 loc) · 12.4 KB

File metadata and controls

253 lines (199 loc) · 12.4 KB

Elasticsearch Sink

Client

Authentication

The API accepts 3 different authentication methods:

Api key auth (http_api_key) Basic auth (http) Bearer auth (http) Api key auth (http_api_key) Elasticsearch APIs support key-based authentication. You must create an API key and use the encoded value in the request header. For example:

curl -X GET "${ES_URL}/_cat/indices?v=true"
-H "Authorization: ApiKey ${API_KEY}" To get API keys, use the /_security/api_key APIs.

Basic auth (http) Basic auth tokens are constructed with the Basic keyword, followed by a space, followed by a base64-encoded string of your username:password (separated by a : colon).

Example: send a Authorization: Basic aGVsbG86aGVsbG8= HTTP header with your requests to authenticate with the API.

Bearer auth (http) Elasticsearch APIs support the use of bearer tokens in the Authorization HTTP header to authenticate with the API. For examples, refer to Token-based authentication services

Add data

You index data into Elasticsearch by sending JSON objects (documents) through the REST APIs. Whether you have structured or unstructured text, numerical data, or geospatial data, Elasticsearch efficiently stores and indexes it in a way that supports fast searches.

For timestamped data such as logs and metrics, you typically add documents to a data stream made up of multiple auto-generated backing indices.

To add a single document to an index, submit an HTTP post request that targets the index.

POST /customer/_doc/1 { "firstname": "Jennifer", "lastname": "Walters" } This request automatically creates the customer index if it doesn’t exist, adds a new document that has an ID of 1, and stores and indexes the firstname and lastname fields.

The new document is available immediately from any node in the cluster. You can retrieve it with a GET request that specifies its document ID:

GET /customer/_doc/1 To add multiple documents in one request, use the _bulk API. Bulk data must be newline-delimited JSON (NDJSON). Each line must end in a newline character (\n), including the last line.

Token-based authentication services ECE ECK Elastic Cloud Hosted Self Managed

The Elastic Stack security features authenticate users by using realms and one or more token-based authentication services. The token-based authentication services are used for authenticating and managing tokens. You can attach these tokens to requests that are sent to Elasticsearch and use them as credentials. When Elasticsearch receives a request that must be authenticated, it consults the token-based authentication services first, and then the realm chain.

The security features provide the following built-in token-based authentication services, which are listed in the order they are consulted:

service-accounts The service accounts use either the create service account token API or the elasticsearch-service-tokens CLI tool to generate service account tokens. To use a service account token, include the generated token value in a request with an Authorization: Bearer header:

curl -H "Authorization: Bearer AAEAAWVsYXN0aWMvZ...mXQtc2VydmMTpyNXdkYmRib1FTZTl2R09Ld2FKR0F3" http://localhost:9200/_cluster/health Important Do not attempt to use service accounts for authenticating individual users. Service accounts can only be authenticated with service tokens, which are not applicable to regular users.

token-service The token service uses the get token API to generate access tokens and refresh tokens based on the OAuth2 specification. The access token is a short-lived token. By default, it expires after 20 minutes but it can be configured to last a maximum of 1 hour. It can be refreshed by using a refresh token, which has a lifetime of 24 hours. The access token is a bearer token. You can use it by sending a request with an Authorization header with a value that has the prefix "Bearer " followed by the value of the access token. For example:

curl -H "Authorization: Bearer dGhpcyBpcyBub3Qx5...F0YS4gZG8gbm90IHRyeSB0byByZWFkIHRva2VuIQ==" http://localhost:9200/_cluster/health

api-key-service The API key service uses the create API key API to generate API keys. By default, the API keys do not expire. When you make a request to create API keys, you can specify an expiration and permissions for the API key. The permissions are limited by the authenticated user’s permissions. You can use the API key by sending a request with an Authorization header with a value that has the prefix "ApiKey " followed by the credentials. The credentials are the base64 encoding of the API key ID and the API key joined by a colon. For example:

curl -H "Authorization: ApiKey VnVhQ2ZHY0JDZGJrU...W0tZTVhT3g6dWkybHAyYXhUTm1zeWFrd0dk5udw==" http://localhost:9200/_cluster/health Depending on your use case, you may want to decide on the lifetime of the tokens generated by these services. You can then use this information to decide which service to use to generate and manage the tokens. Non-expiring API keys may seem like the easy option but you must consider the security implications that come with non-expiring keys. Both the token-service and api-key-service permit you to invalidate the tokens. See invalidate token API and invalidate API key API.

Important Authentication support for JWT bearer tokens was introduced in Elasticsearch 8.2 through the JWT authentication, which cannot be enabled through token-authentication services. Realms offer flexible order and configurations of zero, one, or multiple JWT realms.

Add Data

PUT /_bulk curl
--request PUT 'http://api.example.com/_bulk'
--header "Authorization: $API_KEY"
--header "Content-Type: application/json"
--data '"{ "index" : { "_index" : "test", "_id" : "1" } }\n{ "field1" : "value1" }\n{ "delete" : { "_index" : "test", "_id" : "2" } }\n{ "create" : { "_index" : "test", "_id" : "3" } }\n{ "field1" : "value3" }\n{ "update" : {"_id" : "1", "_index" : "test"} }\n{ "doc" : {"field2" : "value2"} }"' Request examples

Run POST _bulk to perform multiple operations. { "index" : { "_index" : "test", "_id" : "1" } } { "field1" : "value1" } { "delete" : { "_index" : "test", "_id" : "2" } } { "create" : { "_index" : "test", "_id" : "3" } } { "field1" : "value3" } { "update" : {"_id" : "1", "_index" : "test"} } { "doc" : {"field2" : "value2"} } Response examples (200) { "took": 30, "errors": false, "items": [ { "index": { "_index": "test", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 201, "_seq_no" : 0, "_primary_term": 1 } }, { "delete": { "_index": "test", "_id": "2", "_version": 1, "result": "not_found", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 404, "_seq_no" : 1, "_primary_term" : 2 } }, { "create": { "_index": "test", "_id": "3", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 201, "_seq_no" : 2, "_primary_term" : 3 } }, { "update": { "_index": "test", "_id": "1", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 200, "_seq_no" : 3, "_primary_term" : 4 } } ] }

Bulk index or delete documents POST /_bulk Api key auth Basic auth Bearer auth Perform multiple index, create, delete, and update actions in a single request. This reduces overhead and can greatly increase indexing speed.

If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:

To use the create action, you must have the create_doc, create, index, or write index privilege. Data streams support only the create action. To use the index action, you must have the create, index, or write index privilege. To use the delete action, you must have the delete or write index privilege. To use the update action, you must have the index or write index privilege. To automatically create a data stream or index with a bulk API request, you must have the auto_configure, create_index, or manage index privilege. To make the result of a bulk operation visible to search using the refresh parameter, you must have the maintenance or manage index privilege. Automatic data stream creation requires a matching index template with data stream enabled.

The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:

action_and_meta_data\n optional_source\n action_and_meta_data\n optional_source\n .... action_and_meta_data\n optional_source\n The index and create actions expect a source on the next line and have the same semantics as the op_type parameter in the standard index API. A create action fails if a document with the same ID already exists in the target An index action adds or replaces a document as necessary.

NOTE: Data streams support only the create action. To update or delete a document in a data stream, you must target the backing index containing the document.

An update action expects that the partial doc, upsert, and script and its options are specified on the next line.

A delete action does not expect a source on the next line and has the same semantics as the standard delete API.

NOTE: The final line of data must end with a newline character (\n). Each newline character may be preceded by a carriage return (\r). When sending NDJSON data to the _bulk endpoint, use a Content-Type header of application/json or application/x-ndjson. Because this format uses literal newline characters (\n) as delimiters, make sure that the JSON actions and sources are not pretty printed.

If you provide a target in the request path, it is used for any actions that don't explicitly specify an _index argument.

A note on the format: the idea here is to make processing as fast as possible. As some of the actions are redirected to other shards on other nodes, only action_meta_data is parsed on the receiving node side.

Client libraries using this protocol should try and strive to do something similar on the client side, and reduce buffering as much as possible.

There is no "correct" number of actions to perform in a single bulk request. Experiment with different settings to find the optimal size for your particular workload. Note that Elasticsearch limits the maximum size of a HTTP request to 100mb by default so clients must ensure that no request exceeds this size. It is not possible to index a single document that exceeds the size limit, so you must pre-process any such documents into smaller pieces before sending them to Elasticsearch. For instance, split documents into pages or chapters before indexing them, or store raw binary data in a system outside Elasticsearch and replace the raw data with a link to the external system in the documents that you send to Elasticsearch.

Search

Indexed documents are available for search in near real-time. The following search matches all customers with a first name of Jennifer in the customer index.

GET customer/_search { "query" : { "match" : { "firstname": "Jennifer" } } }

Running Elasticsearch locally

https://www.elastic.co/docs/deploy-manage/deploy/self-managed/local-development-installation-quickstart

Basic API quickstart

https://www.elastic.co/docs/solutions/search/elasticsearch-basics-quickstart

Curl Commands

curl -X PUT "localhost:9200/products"
-H "Authorization: ApiKey alplX1hwWUJmQ05FN2I4T0pXSUI6Sk5xVV96VHIza2JtM3hsLWNhWTI0dw=="
-H "Content-Type: application/json"
-d '{ "settings": { "number_of_shards": 1, "number_of_replicas": 0 } }'

curl -X GET "localhost:9200/products/_search?pretty"
-H "Authorization: ApiKey alplX1hwWUJmQ05FN2I4T0pXSUI6Sk5xVV96VHIza2JtM3hsLWNhWTI0dw=="
-H "Content-Type: application/json"
-d '{ "query": { "match": { "name": "steak" } } }'