11This guide covers how to develop and test this project. It assumes that you have cloned this repository to your local
22workstation.
33
4- You must use Java 11 or higher for developing, testing, and building this project. If you wish to use Sonar as
5- described below, you must use Java 17 or higher.
4+ ** You must use Java 17 for developing, testing, and building this project** , even though the connector supports
5+ running on Java 11. For users, Java 17 is only required if using the splitting and embedding features, as those
6+ depend on a third party module that requires Java 17.
7+
8+ ** You also need Java 11 installed** so that the subprojects in this repository that require Java 11 have access to a
9+ Java 11 SDK. [ sdkman] ( https://sdkman.io/ ) is highly recommend for installing multiple JDKs.
610
711# Setup
812
@@ -40,49 +44,37 @@ To run the tests against the test application, run the following Gradle task:
4044
4145 ./gradlew test
4246
43- ## Generating code quality reports with SonarQube
44-
45- In order to use SonarQube, you must have used Docker to run this project's ` docker-compose.yml ` file, and you must
46- have the services in that file running and you must use Java 17 to run the Gradle ` sonar ` task.
47-
48- To configure the SonarQube service, perform the following steps:
47+ ** To run the tests in Intellij** , you must configure your JUnit template to include a few JVM args:
4948
50- 1 . Go to http://localhost:9000 .
51- 2 . Login as admin/admin. SonarQube will ask you to change this password; you can choose whatever you want ("password" works).
52- 3 . Click on "Create project manually".
53- 4 . Enter "marklogic-spark" for the Project Name; use that as the Project Key too.
54- 5 . Enter "develop" as the main branch name.
55- 6 . Click on "Next".
56- 7 . Click on "Use the global setting" and then "Create project".
57- 8 . On the "Analysis Method" page, click on "Locally".
58- 9 . In the "Provide a token" panel, click on "Generate". Copy the token.
59- 10 . Add ` systemProp.sonar.login=your token pasted here ` to ` gradle-local.properties ` in the root of your project, creating
60- that file if it does not exist yet.
49+ 1 . Go to Run -> Edit Configurations.
50+ 2 . Delete any JUnit configurations you already have.
51+ 3 . Click on "Edit configuration templates" and click on "JUnit".
52+ 4 . Click on "Modify options" and select "Add VM options" if it's not already selected.
53+ 5 . In the VM options text input, add the following:
54+ --add-exports=java.base/sun.nio.ch=ALL-UNNAMED --add-exports=java.base/sun.util.calendar=ALL-UNNAMED --add-exports=java.base/sun.security.action=ALL-UNNAMED
55+ 6 . Click "Apply".
56+ 7 . In the dropdown that has "Class" selected, change that to "Method" and hit "Apply" again.
6157
62- To run SonarQube, run the following Gradle tasks using Java 17, which will run all the tests with code coverage and
63- then generate a quality report with SonarQube:
58+ You may need to repeat steps 6 and 7. I've found Intellij to be a little finicky with actually applying these changes.
6459
65- ./gradlew test sonar
60+ The net effect should be that when you run a JUnit class or method or suite of tests, those VM options are automatically
61+ added to the run configuration that Intellij creates for the class/method/suite. Those VM options are required to give
62+ Spark access to certain JVM modules. They are applied automatically when running the tests via Gradle.
6663
67- If you do not add ` systemProp.sonar.login ` to your ` gradle-local.properties ` file, you can specify the token via the
68- following:
64+ ** Alternatively** , you can open Preferences in Intellij and go to
65+ "Build, Execution, and Deployment" -> "Build Tools" -> "Gradle". Then change "Build and run using" and "Run tests using"
66+ to "Gradle". This should result in Intellij using the ` test ` configuration in the ` marklogic-spark-connector/build.gradle `
67+ file that registers the required JVM options, allowing for tests to run on Java 17.
6968
70- ./gradlew test sonar -Dsonar.login=paste your token here
69+ ## Testing text classification
7170
72- When that completes, you will see a line like this near the end of the logging:
71+ See the ` ClassifyAdHocTest ` class for instructions on how to test the text classification feature with a
72+ valid connection to Semaphore.
7373
74- ANALYSIS SUCCESSFUL, you can find the results at: http://localhost:9000/dashboard?id=marklogic-spark
75-
76- Click on that link. If it's the first time you've run the report, you'll see all issues. If you've run the report
77- before, then SonarQube will show "New Code" by default. That's handy, as you can use that to quickly see any issues
78- you've introduced on the feature branch you're working on. You can then click on "Overall Code" to see all issues.
79-
80- Note that if you only need results on code smells and vulnerabilities, you can repeatedly run ` ./gradlew sonar `
81- without having to re-run the tests.
82-
83- You can also force Gradle to run ` sonar ` if any tests fail:
74+ ## Generating code quality reports with SonarQube
8475
85- ./gradlew clean test sonar --continue
76+ Please see our internal Wiki page - search for "Developer Experience SonarQube" -
77+ for information on setting up SonarQube and using it with this repository.
8678
8779# Testing with PySpark
8880
@@ -98,7 +90,7 @@ This will produce a single jar file for the connector in the `./build/libs` dire
9890
9991You can then launch PySpark with the connector available via:
10092
101- pyspark --jars marklogic-spark-connector/build/libs/marklogic-spark-connector-2.5 -SNAPSHOT.jar
93+ pyspark --jars marklogic-spark-connector/build/libs/marklogic-spark-connector-2.6 -SNAPSHOT.jar
10294
10395The below command is an example of loading data from the test application deployed via the instructions at the top of
10496this page.
@@ -158,8 +150,8 @@ spark.read.option("header", True).csv("marklogic-spark-connector/src/test/resour
158150When you run PySpark, it will create its own Spark cluster. If you'd like to try against a separate Spark cluster
159151that still runs on your local machine, perform the following steps:
160152
161- 1 . Use [ sdkman to install Spark] ( https://sdkman.io/sdks#spark ) . Run ` sdk install spark 3.4.3 ` since we are currently
162- building against Spark 3.4.3 .
153+ 1 . Use [ sdkman to install Spark] ( https://sdkman.io/sdks#spark ) . Run ` sdk install spark 3.5.5 ` since we are currently
154+ building against Spark 3.5.5 .
1631552 . ` cd ~/.sdkman/candidates/spark/current/sbin ` , which is where sdkman will install Spark.
1641563 . Run ` ./start-master.sh ` to start a master Spark node.
1651574 . ` cd ../logs ` and open the master log file that was created to find the address for the master node. It will be in a
@@ -174,7 +166,7 @@ The Spark master GUI is at <http://localhost:8080>. You can use this to view det
174166
175167Now that you have a Spark cluster running, you just need to tell PySpark to connect to it:
176168
177- pyspark --master spark://NYWHYC3G0W:7077 --jars marklogic-spark-connector/build/libs/marklogic-spark-connector-2.5 -SNAPSHOT.jar
169+ pyspark --master spark://NYWHYC3G0W:7077 --jars marklogic-spark-connector/build/libs/marklogic-spark-connector-2.6 -SNAPSHOT.jar
178170
179171You can then run the same commands as shown in the PySpark section above. The Spark master GUI will allow you to
180172examine details of each of the commands that you run.
@@ -193,15 +185,15 @@ You will need the connector jar available, so run `./gradlew clean shadowJar` if
193185You can then run a test Python program in this repository via the following (again, change the master address as
194186needed); note that you run this outside of PySpark, and ` spark-submit ` is available after having installed PySpark:
195187
196- spark-submit --master spark://NYWHYC3G0W:7077 --jars marklogic-spark-connector/build/libs/marklogic-spark-connector-2.5 -SNAPSHOT.jar marklogic-spark-connector/src/test/python/test_program.py
188+ spark-submit --master spark://NYWHYC3G0W:7077 --jars marklogic-spark-connector/build/libs/marklogic-spark-connector-2.6 -SNAPSHOT.jar marklogic-spark-connector/src/test/python/test_program.py
197189
198190You can also test a Java program. To do so, first move the ` com.marklogic.spark.TestProgram ` class from ` src/test/java `
199191to ` src/main/java ` . Then run the following:
200192
201193```
202194./gradlew clean shadowJar
203195cd marklogic-spark-connector
204- spark-submit --master spark://NYWHYC3G0W:7077 --class com.marklogic.spark.TestProgram build/libs/marklogic-spark-connector-2.5 -SNAPSHOT.jar
196+ spark-submit --master spark://NYWHYC3G0W:7077 --class com.marklogic.spark.TestProgram build/libs/marklogic-spark-connector-2.6 -SNAPSHOT.jar
205197```
206198
207199Be sure to move ` TestProgram ` back to ` src/test/java ` when you are done.
0 commit comments