File tree Expand file tree Collapse file tree 4 files changed +26
-1
lines changed Expand file tree Collapse file tree 4 files changed +26
-1
lines changed Original file line number Diff line number Diff line change @@ -16,4 +16,7 @@ The connector has the following system requirements:
1616* For writing data, MarkLogic 9.0-9 or higher.
1717* For reading data, MarkLogic 10.0-9 or higher.
1818
19+ In addition, if your MarkLogic cluster has multiple hosts in it, it is highly recommended to put a load balancer in front
20+ of your cluster and have the MarkLogic Spark connector connect through the load balancer.
21+
1922Please see the [ Getting Started guide] ( getting-started/getting-started.md ) to begin using the connector.
Original file line number Diff line number Diff line change @@ -252,9 +252,17 @@ with more partition readers and a higher batch size.
252252You can also adjust the level of parallelism by controlling how many threads Spark uses for executing partition reads.
253253Please see your Spark distribution's documentation for further information.
254254
255+ ### Using a load balancer
256+
257+ If your MarkLogic cluster has multiple hosts, it is highly recommended to put a load balancer in front
258+ of your cluster and have the connector connect through the load balancer. A typical load balancer will help ensure
259+ not only that load is spread across the hosts in your cluster, but that any network or connection failures can be
260+ retried without the error propagating to the connector.
261+
255262### Direct connections to hosts
256263
257- If your Spark program is able to connect to each host in your MarkLogic cluster, you can set the
264+ If you do not have a load balancer in front of your MarkLogic cluster, and your Spark program is able to connect to
265+ each host in your MarkLogic cluster, you can set the
258266` spark.marklogic.client.connectionType ` option to ` direct ` . Each partition reader will then connect to the
259267host on which the reader's assigned forest resides. This will typically improve performance by reducing the network
260268traffic, as the host that receives a request will not need to involve any other host in the processing of that request.
Original file line number Diff line number Diff line change @@ -257,6 +257,13 @@ The effectiveness of this approach can be evaluated by executing the Optic query
257257[ MarkLogic's qconsole application] ( https://docs.marklogic.com/guide/qconsole/intro ) , which will execute the query in
258258a single request as well.
259259
260+ ### Using a load balancer
261+
262+ If your MarkLogic cluster has multiple hosts, it is highly recommended to put a load balancer in front
263+ of your cluster and have the connector connect through the load balancer. A typical load balancer will help ensure
264+ not only that load is spread across the hosts in your cluster, but that any network or connection failures can be
265+ retried without the error propagating to the connector.
266+
260267### More detail on partitions
261268
262269This section is solely informational and is not required understanding for using the connector
Original file line number Diff line number Diff line change @@ -233,6 +233,13 @@ The rule of thumb above can thus be expressed as:
233233
234234 Number of partitions * Value of spark.marklogic.write.threadCount <= Number of hosts * number of app server threads
235235
236+ ### Using a load balancer
237+
238+ If your MarkLogic cluster has multiple hosts, it is highly recommended to put a load balancer in front
239+ of your cluster and have the connector connect through the load balancer. A typical load balancer will help ensure
240+ not only that load is spread across the hosts in your cluster, but that any network or connection failures can be
241+ retried without the error propagating to the connector.
242+
236243### Error handling
237244
238245The connector may throw an error during one of two phases of operation - before it begins to write data to MarkLogic,
You can’t perform that action at this time.
0 commit comments