You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Managed web crawler: You can use the [self-managed web crawler](https://github.com/elastic/crawler) instead.
159
159
* Managed Search connectors: You can use [self-managed Search connectors](asciidocalypse://docs/elasticsearch/docs/reference/ingestion-tools/search-connectors/self-managed-connectors.md) instead.
Copy file name to clipboardExpand all lines: deploy-manage/security/secure-clients-integrations.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ You will need to update the configuration for several [clients](httprest-clients
9
9
10
10
The {{es}} {{security-features}} enable you to secure your {{es}} cluster. But {{es}} itself is only one product within the {{stack}}. It is often the case that other products in the {{stack}} are connected to the cluster and therefore need to be secured as well, or at least communicate with the cluster in a secured way:
* Managed web crawler: You can use the [self-managed web crawler](https://github.com/elastic/crawler) instead.
153
153
* Managed Search connectors: You can use [self-managed Search connectors](asciidocalypse://docs/elasticsearch/docs/reference/ingestion-tools/search-connectors/self-managed-connectors.md) instead.
Copy file name to clipboardExpand all lines: raw-migrated-files/stack-docs/elastic-stack/upgrading-elastic-stack-on-prem.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ Once you are [prepared to upgrade](../../../deploy-manage/upgrade/deployment-or-
5
5
1. Consider closing {{ml}} jobs before you start the upgrade process. While {{ml}} jobs can continue to run during a rolling upgrade, it increases the overhead on the cluster during the upgrade process.
6
6
2. Upgrade the components of your Elastic Stack in the following order:
Copy file name to clipboardExpand all lines: troubleshoot/elasticsearch/elasticsearch-hadoop/elasticsearch-for-apache-hadoop.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,59 +9,59 @@ mapped_pages:
9
9
Unfortunately, sometimes things do not go as expected and your elasticsearch-hadoop job execution might go awry: incorrect data might be read or written, the job might take significantly longer than expected or you might face some exception. This section tries to provide help and tips for doing your own diagnostics, identifying the problem and hopefully fixing it.
Test that {{es}} is reacheable from the Spark/Hadoop cluster where the job is running. Your machine might reach it but that is not where the actual code will be running. If ES is accessible, minimize the number of tasks and their bulk size; if {{es}} is overloaded, it will keep falling behind, GC will kick in and eventually its nodes will become unresponsive causing clients to think the machines have died. See the [*Performance considerations*](asciidocalypse://docs/elasticsearch-hadoop/docs/reference/performance-considerations.md) section for more details.
14
+
Test that {{es}} is reacheable from the Spark/Hadoop cluster where the job is running. Your machine might reach it but that is not where the actual code will be running. If ES is accessible, minimize the number of tasks and their bulk size; if {{es}} is overloaded, it will keep falling behind, GC will kick in and eventually its nodes will become unresponsive causing clients to think the machines have died. See the [*Performance considerations*](elasticsearch-hadoop://reference/performance-considerations.md) section for more details.
15
15
16
16
17
-
### Test your network [_test_your_network]
17
+
### Test your network [_test_your_network]
18
18
19
19
Way too many times, folks use their local, development settings in a production environment. Double check that {{es}} is accessible from your production environments, check the host address and port and that the machines where the Hadoop/Spark job is running can access {{es}} (use `curl`, `telnet` or whatever tool you have available).
20
20
21
21
Using `localhost` (aka the default) in a production environment is simply a misconfiguration.
22
22
23
23
24
-
### Triple check the classpath [_triple_check_the_classpath]
24
+
### Triple check the classpath [_triple_check_the_classpath]
25
25
26
26
Make sure to use only one version of elasticsearch-hadoop in your classpath. While it might not be obvious, the classpath in Hadoop/Spark is assembled from multiple folders; furthermore, there are no guarantees what version is going to be picked up first by the JVM. To avoid obscure issues, double check your classpath and make sure there is only one version of the library in there, the one you are interested in.
27
27
28
28
29
-
### Isolate the issue [_isolate_the_issue]
29
+
### Isolate the issue [_isolate_the_issue]
30
30
31
31
When encountering a problem, do your best to isolate it. This can be quite tricky and many times, it is the hardest part so take your time with it. Take baby steps and try to eliminate unnecessary code or settings in small chunks until you end up with a small, tiny example that exposes your problem.
32
32
33
33
34
-
### Use a speedy, local environment [_use_a_speedy_local_environment]
34
+
### Use a speedy, local environment [_use_a_speedy_local_environment]
35
35
36
36
A lot of Hadoop jobs are batch in nature which means they take a long time to execute. To track down the issue faster, use whatever means possible to speed-up the feedback loop: use a small/tiny dataset (no need to load millions of records, some dozens will do) and use a local/pseudo-distributed Hadoop cluster alongside an Elasticsearch node running on your development machine.
37
37
38
38
39
-
### Check your settings [_check_your_settings]
39
+
### Check your settings [_check_your_settings]
40
40
41
41
Double check your settings and use constants or replicate configurations wherever possible. It is easy to make typos so try to reduce manual configuration by using properties files or constant interfaces/classes. If you are not sure what a setting is doing, remove it or change its value and see whether it affects your job output.
42
42
43
43
44
-
### Verify the input and output [_verify_the_input_and_output]
44
+
### Verify the input and output [_verify_the_input_and_output]
45
45
46
46
Take a close eye at your input and output; this is typically easier to do with Elasticsearch (the service out-lives the job/script, is real-time and can be accessed right away in a flexible meaner, including the command-line). If your data is not persisted (either in Hadoop or Elasticsearch), consider doing that temporarily to validate each step of your work-flow.
47
47
48
48
49
-
### Monitor [_monitor]
49
+
### Monitor [_monitor]
50
50
51
51
While logging helps with bugs and errors, for runtime behavior we strongly recommend doing proper monitoring of your Hadoop and {{es}} cluster. Both are outside the scope of this chapter however there are several popular, free solutions out there that are worth investigating. For {{es}}, we recommend [Marvel](https://www.elastic.co/products/marvel), a free monitoring tool (for development) created by the team behind {{es}}. Monitoring gives insight into how the cluster is actually behaving and helps you correlate behavior. If a monitoring solution is not possible, use the metrics provided by Hadoop, {{es}} and elasticsearch-hadoop to evaluate the runtime behavior.
52
52
53
53
54
-
### Increase logging [_increase_logging]
54
+
### Increase logging [_increase_logging]
55
55
56
-
Logging gives you a lot of insight into what is going on. Hadoop, Spark and {{es}} have extensive logging mechanisms as [does](asciidocalypse://docs/elasticsearch-hadoop/docs/reference/logging.md) elasticsearch-hadoop however use that judiciously: too much logging can hide the actual issue so again, do it in small increments.
56
+
Logging gives you a lot of insight into what is going on. Hadoop, Spark and {{es}} have extensive logging mechanisms as [does](elasticsearch-hadoop://reference/logging.md) elasticsearch-hadoop however use that judiciously: too much logging can hide the actual issue so again, do it in small increments.
57
57
58
58
59
-
### Measure, do not assume [_measure_do_not_assume]
59
+
### Measure, do not assume [_measure_do_not_assume]
60
60
61
61
When encountering a performance issue, do some benchmarking first, in as much isolation as possible. Do not simply assume a certain component is slow; make sure/prove it actually is. Otherwise, more often than not, one might find herself fixing the wrong problem (and typically creating a new one).
62
62
63
63
64
-
### Find a baseline [_find_a_baseline]
64
+
### Find a baseline [_find_a_baseline]
65
65
66
66
Indexing performance depends *heavily* on the type of data being targeted and its mapping. Same goes for searching but add the query definition to the mix. As mentioned before, experiment and measure the various parts of your dataset to find the sweet-spot of your environment before importing/searching big amounts of data.
67
67
@@ -77,7 +77,7 @@ If something is not working, there are two possibilities:
77
77
Whichever it is, a **clear** description of the problem will help other users to help you. The more complete your report is, the quickest you will receive help from users!
78
78
79
79
80
-
### What information is useful? [_what_information_is_useful]
80
+
### What information is useful? [_what_information_is_useful]
81
81
82
82
* OS & JVM version
83
83
* Hadoop / Spark version / distribution
@@ -91,7 +91,7 @@ Whichever it is, a **clear** description of the problem will help other users to
91
91
If you don’t provide all of the information, then it may be difficult for others to figure out where the issue is.
92
92
93
93
94
-
### Where do I post my information? [_where_do_i_post_my_information]
94
+
### Where do I post my information? [_where_do_i_post_my_information]
95
95
96
96
Please don’t paste long lines of code in the mailing list or the IRC – it is difficult to read, and people will be less likely to take the time to help.
0 commit comments