You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+27-26Lines changed: 27 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -77,29 +77,30 @@ When a document is received and written by the connector, you'll see logging lik
77
77
| offset.flush.interval.ms | 10000 | Interval at which to try committing offsets for tasks. |
78
78
79
79
#### MarkLogic-specific properties are defined in config/marklogic-sink.properties
80
-
| Property | Default Value | Description |
81
-
|:-------- |:--------------|:------------|
82
-
| name | marklogic-sink | The name of the connector |
83
-
| connector.class | <div>com.marklogic.kafka.connect.</div>sink.MarkLogicSinkConnector | The FQ name of the connector class |
84
-
| tasks.max | 1 | The maximum number of concurrent tasks |
85
-
| topics | marklogic | The name of the topic(s) to subscribe to |
86
-
| ml.connection.host | localhost | A MarkLogic host to connect to. The connector uses the Data Movement SDK, and thus it will connect to each of the hosts in a cluster. |
87
-
| ml.connection.port | 8000 | The port of a REST API server to connect to. |
88
-
| ml.connection.database | Documents | Optional - the name of a database to connect to. If your REST API server has a content database matching that of the one that you want to write documents to, you do not need to set this. |
89
-
| ml.connection.type | (empty) | Optional - set to "gateway" when using a load balancer, else leave blank. See https://docs.marklogic.com/guide/java/data-movement#id_26583 for more information. |
90
-
| ml.connection.securityContextType | DIGEST | Either DIGEST, BASIC, CERTIFICATE, KERBEROS, or NONE |
| ml.connection.certFile | (empty) | Certificate file for Certificate based authentication |
94
-
| ml.connection.certPassword | (empty) | Certificate password for Certificate based authentication |
95
-
| ml.connection.externalName | (empty) | The external name to use to connect to MarkLogic |
96
-
| ml.connection.simpleSsl | false | Set to "true" for a "simple" SSL strategy that uses the JVM's default SslContext and X509TrustManager and a "trust everything" HostnameVerifier. Further customization of an SSL connection via properties is not supported. If you need to do so, consider using the source code for this connector as a starting point. |
97
-
| ml.dmsdk.batchSize | 100 | Sets the number of documents to be written in a batch to MarkLogic. This may not have any impact depending on the connector receives data from Kafka, as the connector calls flushAsync on the DMSDK WriteBatcher after processing every collection of records. Thus, if the connector never receives at one time more than the value of this property, then the value of this property will have no impact. |
98
-
| ml.dmsdk.threadCount | 8 | Sets the number of threads used by the Data Movement SDK for parallelizing writes to MarkLogic. Similar to the batch size property above, this may never come into play depending on how many records the connector receives at once. |
99
-
| ml.document.collections | kafka-data | Optional - a comma-separated list of collections that each document should be written to |
100
-
| ml.document.addTopicToCollections | false | Set this to true so that the name of the topic that the connector reads from is added as a collection to each document inserted by the connector
101
-
| ml.document.format | JSON | Optional - specify the format of each document; either JSON, XML, BINARY, TEXT, or UNKNOWN |
102
-
| ml.document.mimeType | (empty) | Optional - specify a mime type for each document; typically the format property above will be used instead of this |
103
-
| ml.document.permissions | rest-reader,read,rest-writer,update | Optional - a comma-separated list of roles and capabilities that define the permissions for each document written to MarkLogic |
104
-
| ml.document.uriPrefix | /kafka-data/ | Optional - a prefix to prepend to each URI; the URI itself is a UUID |
105
-
| ml.document.uriSuffix | .json | Optional - a suffix to append to each URI |
| name | marklogic-sink | The name of the connector |
83
+
| connector.class | <div>com.marklogic.kafka.connect.</div>sink.MarkLogicSinkConnector | The FQ name of the connector class |
84
+
| tasks.max | 1 | The maximum number of concurrent tasks |
85
+
| topics | marklogic | The name of the topic(s) to subscribe to |
86
+
| ml.connection.host | localhost | A MarkLogic host to connect to. The connector uses the Data Movement SDK, and thus it will connect to each of the hosts in a cluster. |
87
+
| ml.connection.port | 8000 | The port of a REST API server to connect to. |
88
+
| ml.connection.database | Documents | Optional - the name of a database to connect to. If your REST API server has a content database matching that of the one that you want to write documents to, you do not need to set this. |
89
+
| ml.connection.type | (empty) | Optional - set to "gateway" when using a load balancer, else leave blank. See https://docs.marklogic.com/guide/java/data-movement#id_26583 for more information. |
90
+
| ml.connection.securityContextType | DIGEST | Either DIGEST, BASIC, CERTIFICATE, KERBEROS, or NONE |
| ml.connection.certFile | (empty) | Certificate file for Certificate based authentication |
94
+
| ml.connection.certPassword | (empty) | Certificate password for Certificate based authentication |
95
+
| ml.connection.externalName | (empty) | The external name to use to connect to MarkLogic |
96
+
| ml.connection.simpleSsl | false | Set to "true" for a "simple" SSL strategy that uses the JVM's default SslContext and X509TrustManager and a "trust everything" HostnameVerifier. Further customization of an SSL connection via properties is not supported. If you need to do so, consider using the source code for this connector as a starting point. |
97
+
| ml.dmsdk.batchSize | 100 | Sets the number of documents to be written in a batch to MarkLogic. This may not have any impact depending on the connector receives data from Kafka, as the connector calls flushAsync on the DMSDK WriteBatcher after processing every collection of records. Thus, if the connector never receives at one time more than the value of this property, then the value of this property will have no impact. |
98
+
| ml.dmsdk.threadCount | 8 | Sets the number of threads used by the Data Movement SDK for parallelizing writes to MarkLogic. Similar to the batch size property above, this may never come into play depending on how many records the connector receives at once. |
99
+
| ml.document.collections | kafka-data | Optional - a comma-separated list of collections that each document should be written to |
100
+
| ml.document.addTopicToCollections | false | Set this to true so that the name of the topic that the connector reads from is added as a collection to each document inserted by the connector |
101
+
| ml.document.temporalCollection | (empty) | Specify the name of a temporal collection for documents to be inserted into |
102
+
| ml.document.format | JSON | Optional - specify the format of each document; either JSON, XML, BINARY, TEXT, or UNKNOWN |
103
+
| ml.document.mimeType | (empty) | Optional - specify a mime type for each document; typically the format property above will be used instead of this |
104
+
| ml.document.permissions | rest-reader,read,rest-writer,update | Optional - a comma-separated list of roles and capabilities that define the permissions for each document written to MarkLogic |
105
+
| ml.document.uriPrefix | /kafka-data/ | Optional - a prefix to prepend to each URI; the URI itself is a UUID |
106
+
| ml.document.uriSuffix | .json | Optional - a suffix to append to each URI |
# Optional - set this to true so that the name of the topic that the connector reads from is added as a collection to each document inserted by the connector
64
64
ml.document.addTopicToCollections=false
65
65
66
+
# Specify the name of a temporal collection for documents to be inserted into
67
+
ml.document.temporalCollection=
68
+
66
69
# Optional - specify the format of each document; either JSON, XML, BINARY, TEXT, or UNKNOWN
@@ -69,13 +70,14 @@ public class MarkLogicSinkConfig extends AbstractConfig {
69
70
.define(DATAHUB_FLOW_NAME, Type.STRING, null, Importance.MEDIUM, "Name of a Data Hub flow to run")
70
71
.define(DATAHUB_FLOW_STEPS, Type.STRING, null, Importance.MEDIUM, "Comma-delimited names of steps to run")
71
72
.define(DATAHUB_FLOW_LOG_RESPONSE, Type.BOOLEAN, false, Importance.LOW, "If set to true, the response from running a flow on each ingested batch will be logged at the info level")
72
-
.define(DMSDK_BATCH_SIZE, Type.INT, 100, Importance.HIGH, "Number of documents to write in each batch")
73
-
.define(DMSDK_THREAD_COUNT, Type.INT, 8, Importance.HIGH, "Number of threads for DMSDK to use")
74
-
.define(DMSDK_TRANSFORM, Type.STRING, "", Importance.MEDIUM, "Name of a REST transform to use when writing documents")
75
-
.define(DMSDK_TRANSFORM_PARAMS, Type.STRING, "", Importance.MEDIUM, "Delimited set of transform names and values")
73
+
.define(DMSDK_BATCH_SIZE, Type.INT, null, Importance.HIGH, "Number of documents to write in each batch")
74
+
.define(DMSDK_THREAD_COUNT, Type.INT, null, Importance.HIGH, "Number of threads for DMSDK to use")
75
+
.define(DMSDK_TRANSFORM, Type.STRING, null, Importance.MEDIUM, "Name of a REST transform to use when writing documents")
76
+
.define(DMSDK_TRANSFORM_PARAMS, Type.STRING, null, Importance.MEDIUM, "Delimited set of transform names and values")
76
77
.define(DMSDK_TRANSFORM_PARAMS_DELIMITER, Type.STRING, ",", Importance.LOW, "Delimiter for transform parameter names and values; defaults to a comma")
77
78
.define(DOCUMENT_COLLECTIONS_ADD_TOPIC, Type.BOOLEAN, false, Importance.LOW, "Indicates if the topic name should be added to the set of collections for a document")
78
79
.define(DOCUMENT_COLLECTIONS, Type.STRING, "", Importance.MEDIUM, "String-delimited collections to add each document to")
80
+
.define(DOCUMENT_TEMPORAL_COLLECTION, Type.STRING, null, Importance.LOW, "Specify the name of a temporal collection for documents to be inserted into")
79
81
.define(DOCUMENT_FORMAT, Type.STRING, "", Importance.LOW, "Defines format of each document; can be one of json, xml, text, binary, or unknown")
80
82
.define(DOCUMENT_MIMETYPE, Type.STRING, "", Importance.LOW, "Defines the mime type of each document; optional, and typically the format is set instead of the mime type")
81
83
.define(DOCUMENT_PERMISSIONS, Type.STRING, "", Importance.MEDIUM, "String-delimited permissions to add to each document; role1,capability1,role2,capability2,etc")
0 commit comments