@@ -241,6 +241,46 @@ spark.read.format("com.marklogic.spark") \
241241 .save()
242242```
243243
244+ ### Processing multiple rows in a single call
245+
246+ By default, a single row is sent by the connector to your custom code. In many use cases, particularly when writing
247+ documents, you will achieve far better performance when configuring the connector to send many rows in a single
248+ call to your custom code.
249+
250+ The configuration option ` spark.marklogic.write.batchSize ` controls the number of row values sent to the custom code.
251+ If not specified, this defaults to 1 (as opposed to 100 when writing rows as documents). If set to a
252+ value greater than one, then the values will be sent in the following manner:
253+
254+ 1 . If a custom schema is used, then the JSON objects representing the set of rows in the batch will first be added to a
255+ JSON array, and then the array will be set to the external variable.
256+ 2 . Otherwise, the row values from the "URI" column will be concatenated together with a comma as a delimiter.
257+
258+
259+ For approach #2 , an alternate delimiter can be configured via ` spark.marklogic.write.externalVariableDelimiter ` . This
260+ would be needed in case your "URI" values may have commas in them. Regardless of the delimiter value, you will
261+ typically use code like that shown below for splitting the "URI" value into many values:
262+
263+ ```
264+ for (var uri of URI.split(',')) {
265+ // Process each row value here.
266+ }
267+ ```
268+
269+ When using a custom schema, you will typically use [ xdmp.fromJSON] ( https://docs.marklogic.com/xdmp.fromJSON ) to convert
270+ the value passed to your custom code into a JSON array:
271+
272+ ```
273+ // Assumes that URI is a JSON array node because a custom schema is being used.
274+ const array = fn.head(xdmp.fromJSON(URI));
275+ ```
276+
277+ Processing multiple rows in a single call can have a significant impact on performance by reducing the number of calls
278+ to MarkLogic. For example, if you are writing documents with your custom code, it is recommended to try a batch size of
279+ 100 or greater to test how much performance improves. The
280+ [ MarkLogic monitoring dashboard] ( https://docs.marklogic.com/guide/monitoring/dashboard ) is a very useful tool for
281+ examining how many requests are being sent by the connector to MarkLogic and how quickly each request is processed,
282+ along with overall resource consumption.
283+
244284### External variable configuration
245285
246286As shown in the examples above, the custom code for processing a row must have an external variable named "URI". If
@@ -296,27 +336,6 @@ allowing you to access its data:
296336const doc = fn.head(xdmp.fromJSON(URI));
297337```
298338
299- ### Processing multiple rows in a single call
300-
301- The configuration option ` spark.marklogic.write.batchSize ` controls the number of row values sent to the custom code
302- in a single call. If not specified, this defaults to 1 (as opposed to 100 when writing rows as documents). If set to a
303- value greater than one, then the values will be sent in the following manner:
304-
305- 1 . If a custom schema is used, then the JSON objects representing the set of rows in the batch will first be added to a
306- JSON array, and then the array will be set to the external variable.
307- 2 . Otherwise, the row values from the "URI" column will be concatenated together with a comma as a delimiter.
308-
309- For approach #2 , an alternate delimiter can be configured via ` spark.marklogic.write.externalVariableDelimiter ` . This
310- would be needed in case your "URI" values may have commas in them.
311-
312- When using a custom schema, you will typically use [ xdmp.fromJSON] ( https://docs.marklogic.com/xdmp.fromJSON ) to convert
313- the value passed to your custom code into a JSON array:
314-
315- ```
316- // Assumes that URI is a JSON array node because a custom schema is being used.
317- const array = fn.head(xdmp.fromJSON(URI));
318- ```
319-
320339### Streaming support
321340
322341Spark's support for [ streaming writes] ( https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html )
0 commit comments