Skip to content

Commit afd19a3

Browse files
authored
Merge pull request #328 from marklogic/release/2.4.2
Merge 2.4.2 into master
2 parents 7ff9e0f + 8454c58 commit afd19a3

File tree

22 files changed

+306
-34
lines changed

22 files changed

+306
-34
lines changed

build.gradle

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ plugins {
99
}
1010

1111
group 'com.marklogic'
12-
version '2.4.1'
12+
version '2.4.2'
1313

1414
java {
1515
// To support reading RDF files, Apache Jena is used - but that requires Java 11.

docs/getting-started/jupyter.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,15 @@ connector and also to initialize Spark:
3232

3333
```
3434
import os
35-
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars "/path/to/marklogic-spark-connector-2.4.1.jar" pyspark-shell'
35+
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars "/path/to/marklogic-spark-connector-2.4.2.jar" pyspark-shell'
3636
3737
from pyspark.sql import SparkSession
3838
spark = SparkSession.builder.master("local[*]").appName('My Notebook').getOrCreate()
3939
spark.sparkContext.setLogLevel("WARN")
4040
spark
4141
```
4242

43-
The path of `/path/to/marklogic-spark-connector-2.4.1.jar` should be changed to match the location of the connector
43+
The path of `/path/to/marklogic-spark-connector-2.4.2.jar` should be changed to match the location of the connector
4444
jar on your filesystem. You are free to customize the `spark` variable in any manner you see fit as well.
4545

4646
Now that you have an initialized Spark session, you can run any of the examples found in the

docs/getting-started/pyspark.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ shell by pressing `ctrl-D`.
2929

3030
Run PySpark from the directory that you downloaded the connector to per the [setup instructions](setup.md):
3131

32-
pyspark --jars marklogic-spark-connector-2.4.1.jar
32+
pyspark --jars marklogic-spark-connector-2.4.2.jar
3333

3434
The `--jars` command line option is PySpark's method for utilizing Spark connectors. Each Spark environment should have
3535
a similar mechanism for including third party connectors; please see the documentation for your particular Spark

docs/getting-started/setup.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,10 @@ have an instance of MarkLogic running, you can skip step 4 below, but ensure tha
3131
extracted directory contains valid connection properties for your instance of MarkLogic.
3232

3333
1. From [this repository's Releases page](https://github.com/marklogic/marklogic-spark-connector/releases), select
34-
the latest release and download the `marklogic-spark-getting-started-2.4.1.zip` file.
34+
the latest release and download the `marklogic-spark-getting-started-2.4.2.zip` file.
3535
2. Extract the contents of the downloaded zip file.
3636
3. Open a terminal window and go to the directory created by extracting the zip file; the directory should have a
37-
name of "marklogic-spark-getting-started-2.4.1".
37+
name of "marklogic-spark-getting-started-2.4.2".
3838
4. Run `docker-compose up -d` to start an instance of MarkLogic
3939
5. Ensure that the `./gradlew` file is executable; depending on your operating system, you may need to run
4040
`chmod 755 gradlew` to make the file executable.

examples/entity-aggregation/build.gradle

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ repositories {
88

99
dependencies {
1010
implementation 'org.apache.spark:spark-sql_2.12:3.5.3'
11-
implementation "com.marklogic:marklogic-spark-connector:2.4.1"
11+
implementation "com.marklogic:marklogic-spark-connector:2.4.2"
1212
implementation "org.postgresql:postgresql:42.6.2"
1313
}
1414

examples/getting-started/marklogic-spark-getting-started.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"source": [
1010
"# Make the MarkLogic connector available to the underlying PySpark application.\n",
1111
"import os\n",
12-
"os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars \"marklogic-spark-connector-2.4.1.jar\" pyspark-shell'\n",
12+
"os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars \"marklogic-spark-connector-2.4.2.jar\" pyspark-shell'\n",
1313
"\n",
1414
"# Define the connection details for the getting-started example application.\n",
1515
"client_uri = \"spark-example-user:password@localhost:8003\"\n",

examples/java-dependency/build.gradle

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ repositories {
88

99
dependencies {
1010
implementation 'org.apache.spark:spark-sql_2.12:3.5.3'
11-
implementation 'com.marklogic:marklogic-spark-connector:2.4.1'
11+
implementation 'com.marklogic:marklogic-spark-connector:2.4.2'
1212
}
1313

1414
task runApp(type: JavaExec) {

src/main/java/com/marklogic/spark/ContextSupport.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,10 @@ public final String getStringOption(String option) {
145145
return hasOption(option) ? properties.get(option).trim() : null;
146146
}
147147

148+
public final boolean getBooleanOption(String option, boolean defaultValue) {
149+
return hasOption(option) ? Boolean.parseBoolean(getStringOption(option)) : defaultValue;
150+
}
151+
148152
public final boolean isStreamingFiles() {
149153
return "true".equalsIgnoreCase(getStringOption(Options.STREAM_FILES));
150154
}

src/main/java/com/marklogic/spark/Options.java

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,14 @@ public abstract class Options {
5959
public static final String READ_TRIPLES_FILTERED = "spark.marklogic.read.triples.filtered";
6060
public static final String READ_TRIPLES_BASE_IRI = "spark.marklogic.read.triples.baseIri";
6161

62+
/**
63+
* The connector uses a consistent snapshot by default. Setting this to false results in queries being executed
64+
* at multiple points of time, potentially yielding inconsistent results.
65+
*
66+
* @since 2.4.2
67+
*/
68+
public static final String READ_SNAPSHOT = "spark.marklogic.read.snapshot";
69+
6270
// For logging progress when reading documents, rows, or items via custom code. Defines the interval at which
6371
// progress should be logged - e.g. a value of 10,000 will result in a message being logged on every 10,000 items
6472
// being written/processed.

src/main/java/com/marklogic/spark/reader/document/DocumentContext.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,11 @@ int getPartitionsPerForest() {
9999
return (int) getNumericOption(Options.READ_DOCUMENTS_PARTITIONS_PER_FOREST, defaultPartitionsPerForest, 1);
100100
}
101101

102+
boolean isConsistentSnapshot() {
103+
// Starting in 2.2.0 and through 2.4.2, the default is a consistent snapshot. We may change this later.
104+
return getBooleanOption(Options.READ_SNAPSHOT, true);
105+
}
106+
102107
void setLimit(Integer limit) {
103108
this.limit = limit;
104109
}

0 commit comments

Comments
 (0)