You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reading-data/documents.md
+100Lines changed: 100 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -222,6 +222,106 @@ the cost of a filtered search may be outweighed by the connector having to retur
222
222
a filtered search will both return accurate results and may be faster. Ideally though, you can configure indexes on your
223
223
database to allow for an unfiltered search, which will return accurate results and be faster than a filtered search.
224
224
225
+
## Using secondary URIs queries
226
+
227
+
As of version 2.7.0, the connector supports executing secondary queries to retrieve additional URIs beyond those specified in your initial document query. This feature is useful when you need to read documents that are related to your initial set of documents through shared data values or other relationships.
228
+
229
+
When using secondary URIs queries, the connector will first retrieve the URIs from your primary query (via `spark.marklogic.read.documents.uris` or other document query options), then execute your secondary query code with access to those URIs, and finally return documents for both the original URIs and any additional URIs returned by the secondary query.
230
+
231
+
### Basic usage
232
+
233
+
You can execute a secondary query using JavaScript via the `spark.marklogic.read.secondaryUris.javascript` option:
Your secondary query code has access to the URIs from your primary query through:
276
+
277
+
-**JavaScript**: An external variable named `URIs` containing the array of URIs
278
+
-**XQuery**: An external variable named `$URIs` containing a JSON array of the URIs
279
+
280
+
The examples above show how to use these URIs to find related documents - in this case, finding other author documents that share the same CitationID values as the original documents.
281
+
282
+
### Using module invocation
283
+
284
+
You can invoke a JavaScript or XQuery module from your application's modules database via the `spark.marklogic.read.secondaryUris.invoke` option:
You can specify local file paths containing either JavaScript or XQuery code via the `spark.marklogic.read.secondaryUris.javascriptFile` and `spark.marklogic.read.secondaryUris.xqueryFile` options:
You can pass external variables to your secondary query code by configuring options with names starting with`spark.marklogic.read.secondaryUris.vars.`. The remainder of the option name will be used as the external variable name:
Secondary URIs queries are particularly useful for:
316
+
317
+
-**Document relationships**: Finding documents that reference or are referenced by your initial set
318
+
-**Hierarchical data**: Retrieving parent or child documents in a hierarchy
319
+
-**Cross-references**: Finding documents that share common property values
320
+
-**Graph traversal**: Following relationships between documents to expand your result set
321
+
-**Data enrichment**: Adding related documents to provide fuller context for analysis
322
+
323
+
The secondary query is executed after the primary document selection, allowing you to build complex multi-step queries that would be difficult to express in a single MarkLogic search operation.
324
+
225
325
## Tuning performance
226
326
227
327
The connector mimics the behavior of the [MarkLogic Data Movement SDK](https://docs.marklogic.com/guide/java/data-movement)
Copy file name to clipboardExpand all lines: docs/writing.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -244,6 +244,8 @@ The options controlling the embedder feature are:
244
244
| spark.marklogic.write.embedder.embedding.name | Allows for the embedding name to be customized when the embedding is added to a JSON or XML chunk. |
245
245
| spark.marklogic.write.embedder.embedding.namespace | Allows for an optional namespace to be assigned to the embedding element in an XML chunk. |
246
246
| spark.marklogic.write.embedder.batchSize | Defines the number of chunks to send to the embedding model in a single call. Defaults to 1. |
247
+
| spark.marklogic.write.embedder.prompt | New in 2.7.0 - optional prompt to prepend to the text sent to the embedding model. Useful for providing context or other information to the embedding model. |
248
+
| spark.marklogic.write.embedder.base64encode | New in 2.7.0 - encodes each vector produced by the embedding model using a format compliant with the new vector encoding functions in MarkLogic 12. Useful for reducing the size of vectors in your documents. |
0 commit comments