Skip to content

Commit 2127634

Browse files
committed
Reworked docs for config options
Resolves #57. - Updated docs for each config option so they're the same in all 3 places - README, marklogic-sink.properties, and MarkLogicSinkConfig - Updated the CONTRIBUTING guide for testing with Apache Kafka - Updated the README so it is focused on using the connectors as opposed to developing/testing them (more updates will happen to this before 1.7.0 is released)
1 parent 9d81f46 commit 2127634

File tree

11 files changed

+357
-334
lines changed

11 files changed

+357
-334
lines changed

AWS-CloudFormation/s3Resources/marklogic-sink.properties

Lines changed: 1 addition & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -2,105 +2,21 @@
22

33
name=marklogic-sink
44
connector.class=com.marklogic.kafka.connect.sink.MarkLogicSinkConnector
5-
6-
# Should only need one task since it's using a WriteBatcher, which is multi-threaded
75
tasks.max=1
8-
96
# Topics to consume from [comma separated list for multiple topics]
107
topics=marklogic
118

12-
139
# MarkLogic connector-specific properties
10+
# See ./config/marklogic-sink.properties for information on each of these
1411

15-
# A MarkLogic host to connect to. The connector uses the Data Movement SDK, and thus it will connect to each of the
16-
# hosts in a cluster.
1712
ml.connection.host=172.31.48.57
18-
19-
# The port of a REST API server to connect to.
2013
ml.connection.port=8003
21-
22-
# Optional - the name of a database to connect to. If your REST API server has a content database matching that of the
23-
# one that you want to write documents to, you do not need to set this.
2414
ml.connection.database=Kafka
25-
26-
# Optional - set to "gateway" when using a load balancer, else leave blank.
27-
# See https://docs.marklogic.com/guide/java/data-movement#id_26583 for more information.
28-
ml.connection.type=
29-
30-
# Either DIGEST, BASIC, CERTIFICATE, KERBEROS, or NONE
3115
ml.connection.securityContextType=DIGEST
32-
33-
# Set these based on the security context type defined above
3416
ml.connection.username=admin
3517
ml.connection.password=admin
36-
ml.connection.certFile=
37-
ml.connection.certPassword=
38-
ml.connection.externalName=
39-
40-
# Set "ml.connection.simpleSsl" to "true" for a "simple" SSL strategy that uses the JVM's default SslContext and
41-
# X509TrustManager and a "trust everything" HostnameVerifier. Further customization of an SSL connection via properties
42-
# is not supported. If you need to do so, consider using the source code for this connector as a starting point.
43-
ml.connection.simpleSsl=false
44-
# You must also ensure that the server cert or the signing CA cert is imported in the JVMs cacerts file.
45-
# These commands may be used to get the server cert and to import it into your cacerts file.
46-
# Don't forget to customize the commands for your particular case.
47-
# openssl x509 -in <(openssl s_client -connect <server>:8004 -prexit 2>/dev/null) -out ~/example.crt
48-
# sudo keytool -importcert -file ~/example.crt -alias <server> -keystore /path/to/java/lib/security/cacerts -storepass <storepass-password>
49-
50-
# Sets the number of documents to be written in a batch to MarkLogic. This may not have any impact depending on the
51-
# connector receives data from Kafka, as the connector calls flushAsync on the DMSDK WriteBatcher after processing every
52-
# collection of records. Thus, if the connector never receives at one time more than the value of this property, then
53-
# the value of this property will have no impact.
54-
ml.dmsdk.batchSize=100
55-
56-
# Sets the number of threads used by the Data Movement SDK for parallelizing writes to MarkLogic. Similar to the batch
57-
# size property above, this may never come into play depending on how many records the connector receives at once.
58-
ml.dmsdk.threadCount=8
59-
60-
# Optional - a comma-separated list of collections that each document should be written to
6118
ml.document.collections=kafka-data
62-
63-
# Optional - set this to true so that the name of the topic that the connector reads from is added as a collection to each document inserted by the connector
64-
ml.document.addTopicToCollections=false
65-
66-
# Optional - specify the format of each document; either JSON, XML, BINARY, TEXT, or UNKNOWN
6719
ml.document.format=JSON
68-
69-
# Optional - specify a mime type for each document; typically the format property above will be used instead of this
70-
ml.document.mimeType=
71-
72-
# Optional - a comma-separated list of roles and capabilities that define the permissions for each document written to MarkLogic
7320
ml.document.permissions=rest-reader,read,rest-writer,update
74-
75-
# Optional - a prefix to prepend to each URI; the URI itself is a UUID
7621
ml.document.uriPrefix=/kafka-data/
77-
78-
# Optional - a suffix to append to each URI
7922
ml.document.uriSuffix=.json
80-
81-
# Optional - name of a REST transform to use when writing documents
82-
# For Data Hub, can use mlRunIngest
83-
ml.dmsdk.transform=
84-
85-
# Optional - delimited set of transform names and values
86-
# Data Hub example = flow-name,ingestion_mapping_mastering-flow,step,1
87-
ml.dmsdk.transformParams=
88-
89-
# Optional - delimiter for transform parameter names and values
90-
ml.dmsdk.transformParamsDelimiter=,
91-
92-
# Properties for running a Data Hub flow
93-
# Using examples/dh-5-example in the DH project, could use the following config:
94-
# ml.datahub.flow.name=ingestion_mapping_mastering-flow
95-
# ml.datahub.flow.steps=2,3,4
96-
ml.datahub.flow.name=
97-
ml.datahub.flow.steps=
98-
# Whether or not the response data from running a flow should be logged at the info level
99-
ml.datahub.flow.logResponse=true
100-
101-
ml.id.strategy=
102-
ml.id.strategy.paths=
103-
ml.connection.enableCustomSsl=false
104-
ml.connection.customSsl.tlsVersion=
105-
ml.connection.customSsl.hostNameVerifier=
106-
ml.connection.customSsl.mutualAuth=false

CONTRIBUTING.md

Lines changed: 56 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
This guide describes how to develop and contribute pull requests to this connector.
1+
This guide describes how to develop and contribute pull requests to this connector. The focus is currently on how to
2+
develop and test the connector, either via a local install of Confluent Platform or of the regular Kafka distribution.
23

34
# Running the test suite
45

@@ -27,8 +28,11 @@ Alternatively, you can import this project into an IDE such as IntelliJ and run
2728
# Testing with Confluent Platform
2829

2930
[Confluent Platform](https://docs.confluent.io/platform/7.2.1/overview.html) provides an easy mechanism for running
30-
Kafka locally via a single application. To try out the MarkLogic Kafka connector via the Confluent Platform, follow
31-
the steps below.
31+
Kafka locally via a single application. A primary benefit of testing with Confluent Platform is to test configuring the
32+
MarkLogic Kafka connector via the [Confluent Control Center](https://docs.confluent.io/platform/current/control-center/index.html)
33+
web application.
34+
35+
To try out the MarkLogic Kafka connector via the Confluent Platform, follow the steps below.
3236

3337
## Install Confluent Platform with the MarkLogic Kafka connector
3438

@@ -173,4 +177,52 @@ services (sometimes Schema Registry, sometimes Control Center) usually stops wor
173177

174178
# Testing with Apache Kafka
175179

176-
TODO, will borrow a lot of content from the README.
180+
The primary reason to test the MarkLogic Kafka connector via a regular Kafka distribution is that the development
181+
cycle is much faster and more reliable - i.e. you can repeatedly redeploy the connector and restart Kafka Connect to
182+
test changes, and Kafka Connect will continue to work fine. This is particularly useful when the changes you're testing
183+
do not require testing the GUI provided by Confluent Control Center.
184+
185+
To get started, these instructions assume that you already have an instance of Apache Kafka installed; the
186+
[Kafka Quickstart](https://kafka.apache.org/quickstart) instructions provide an easy way of accomplishing this. Perform
187+
step 1 of these instructions before proceeding.
188+
189+
Next, configure your Gradle properties to point to your Kafka installation and deploy the connector there:
190+
191+
1. Configure `kafkaHome` in gradle-local.properties - e.g. `kafkaHome=/Users/myusername/kafka_2.13-2.8.1`
192+
2. Configure `kafkaMlUsername` and `kafkaMlPassword` in gradle-local.properties, setting these to a MarkLogic user that
193+
is able to write documents to MarkLogic. These values will be used to populate the
194+
`ml.connection.username` and `ml.connection.password` connector properties.
195+
3. Run `./gradlew clean deploy` to build a jar and copy it and the config property files to your Kafka installation
196+
197+
[Step 2 in the Kafka Quickstart guide](https://kafka.apache.org/quickstart) provides the instructions for starting the
198+
separate Zookeeper and Kafka server processes. You'll need to run these commands from your Kafka installation
199+
directory. As of August 2022, those commands are (these seem very unlikely to change and thus are included here for
200+
convenience):
201+
202+
bin/zookeeper-server-start.sh config/zookeeper.properties
203+
204+
and
205+
206+
bin/kafka-server-start.sh config/server.properties
207+
208+
Next, start the Kafka connector in standalone mode (also from the Kafka home directory):
209+
210+
bin/connect-standalone.sh config/marklogic-connect-standalone.properties config/marklogic-sink.properties
211+
212+
You'll see a fair amount of logging from Kafka itself; near the end of the logging, look for messages from
213+
`MarkLogicSinkTask` and MarkLogic Java Client classes such as `WriteBatcherImpl` to ensure that the connector has
214+
started up correctly.
215+
216+
To test out the connector, you can use the following command to enter a CLI that allows you to manually send
217+
messages to the `marklogic` topic that the connector is configured by default to read from:
218+
219+
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic marklogic
220+
221+
Be sure that the messages you send are consistent with your configuration properties - i.e. if you've set a format of
222+
JSON, you should send properly formed JSON objects.
223+
224+
When a document is received and written by the connector, you'll see logging like this:
225+
226+
```
227+
[2018-12-20 12:54:13,561] INFO flushing 1 queued docs (com.marklogic.client.datamovement.impl.WriteBatcherImpl:549)
228+
```

0 commit comments

Comments
 (0)