Skip to content

Commit fb9be37

Browse files
authored
Merge pull request #395 from marklogic/release/1.3.0
Merge release/1.3.0 into main
2 parents 134cce5 + 140d641 commit fb9be37

File tree

138 files changed

+2423
-48050
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

138 files changed

+2423
-48050
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@ flux/conf
1212
flux-cli/src/dist/ext/*.jar
1313
flux-version.properties
1414
docker/sonarqube
15+
optionsExperiments

CODEOWNERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
# Each line is a file pattern followed by one or more owners.
33

44
# These owners will be the default owners for everything in the repo.
5-
* @anu3990 @billfarber @rjrudin
5+
* @anu3990 @billfarber @rjrudin @stevebio

CONTRIBUTING.md

Lines changed: 10 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ application installed:
1010
Next, run the following to pull a small model for the test instance of Ollama to use; this will be used by one or more
1111
embedder tests:
1212

13-
docker exec -it flux-ollama-1 ollama pull all-minilm
13+
docker exec -it docker-tests-flux-ollama-1 ollama pull all-minilm
1414

1515
Some of the tests depend on the Postgres instance deployed via Docker. Follow these steps to load a sample dataset
1616
into it:
@@ -24,11 +24,11 @@ downloading the `dvdrental.zip` and extracting it to produce a file named `dvdre
2424
Once you have the `dvdrental.tar` file in place, run these commands to load it into Postgres:
2525

2626
```
27-
docker exec -it flux-postgres-1 psql -U postgres -c "CREATE DATABASE dvdrental"
28-
docker exec -it flux-postgres-1 pg_restore -U postgres -d dvdrental /opt/dvdrental.tar
27+
docker exec -it docker-tests-flux-postgres-1 psql -U postgres -c "CREATE DATABASE dvdrental"
28+
docker exec -it docker-tests-flux-postgres-1 pg_restore -U postgres -d dvdrental /opt/dvdrental.tar
2929
```
3030

31-
The Docker file includes a pgadmin instance which can be accessed at <http://localhost:15432/>.
31+
The Docker file includes a pgadmin instance which can be accessed at <http://localhost:5480/>.
3232
If you wish to login to this, do so with "postgres@pgadmin.com" and
3333
a password of "postgres". For logging into Postgres itself, use "postgres" as the username and password. You can then
3434
register a server that connects to the "postgres" server.
@@ -104,44 +104,13 @@ tests. You do not need to do this if you have Intellij configured to use Gradle
104104

105105
## Generating code quality reports with SonarQube
106106

107-
In order to use SonarQube, you must have used Docker to run this project's `docker-compose.yml` file, and you must
108-
have the services in that file running. You must also use Java 17 to run the `sonar` Gradle task.
107+
Please see our internal Wiki page - search for "Developer Experience SonarQube" -
108+
for information on setting up SonarQube and using it with this repository.
109109

110-
To configure the SonarQube service, perform the following steps:
111-
112-
1. Go to http://localhost:9000 .
113-
2. Login as admin/admin. SonarQube will ask you to change this password; you can choose whatever you want ("password" works).
114-
3. Click on "Create project manually".
115-
4. Enter "flux" for the Project Name; use that as the Project Key too.
116-
5. Enter "main" as the main branch name.
117-
6. Click on "Next".
118-
7. Click on "Use the global setting" and then "Create project".
119-
8. On the "Analysis Method" page, click on "Locally".
120-
9. In the "Provide a token" panel, click on "Generate". Copy the token.
121-
10. Add `systemProp.sonar.login=your token pasted here` to `gradle-local.properties` in the root of your project, creating
122-
that file if it does not exist yet.
123-
124-
To run SonarQube, run the following Gradle tasks with Java 17 or higher, which will run all the tests with code
125-
coverage and then generate a quality report with SonarQube:
126-
127-
./gradlew test sonar
128-
129-
If you do not add `systemProp.sonar.login` to your `gradle-local.properties` file, you can specify the token via the
130-
following:
131-
132-
./gradlew test sonar -Dsonar.login=paste your token here
133-
134-
When that completes, you will see a line like this near the end of the logging:
135-
136-
ANALYSIS SUCCESSFUL, you can find the results at: http://localhost:9000/dashboard?id=flux
137-
138-
Click on that link. If it's the first time you've run the report, you'll see all issues. If you've run the report
139-
before, then SonarQube will show "New Code" by default. That's handy, as you can use that to quickly see any issues
140-
you've introduced on the feature branch you're working on. You can then click on "Overall Code" to see all issues.
141-
142-
Note that if you only need results on code smells and vulnerabilities, you can repeatedly run `./gradlew sonar`
143-
without having to re-run the tests. If you get an error from Sonar about Java sources, you just need to compile the
144-
Java code, so run `./gradlew compileTestJava sonar`.
110+
You can run `./gradlew clean testCodeCoverageReport` to run the tests and generate code coverage data. The output will
111+
be written to `code-coverage-report/build`. Unfortunately though, Sonarqube does not appear to consume this data
112+
correctly. For example, as of 2025-04-23, the Jacoco test report will show 84% coverage but Sonarqube will only report
113+
76% coverage.
145114

146115
## Testing the documentation locally
147116

Jenkinsfile

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,14 @@ def runtests(){
2828
./gradlew -i mlDeploy;
2929
wget https://www.postgresqltutorial.com/wp-content/uploads/2019/05/dvdrental.zip;
3030
unzip dvdrental.zip -d docker/postgres/ ;
31-
docker exec -i flux-postgres-1 psql -U postgres -c "CREATE DATABASE dvdrental";
32-
docker exec -i flux-postgres-1 pg_restore -U postgres -d dvdrental /opt/dvdrental.tar;
31+
docker exec -i docker-tests-flux-postgres-1 psql -U postgres -c "CREATE DATABASE dvdrental";
32+
docker exec -i docker-tests-flux-postgres-1 pg_restore -U postgres -d dvdrental /opt/dvdrental.tar;
3333
cd $WORKSPACE/flux/;
34-
./gradlew --refresh-dependencies clean test || true;
34+
./gradlew --refresh-dependencies clean testCodeCoverageReport || true;
3535
'''
3636
junit '**/*.xml'
3737
}
38+
3839
def postCleanup(){
3940
sh label:'mlcleanup', script: '''#!/bin/bash
4041
cd $WORKSPACE/flux;
@@ -45,6 +46,7 @@ def postCleanup(){
4546
echo "y" | docker volume prune --filter all=1 || true;
4647
'''
4748
}
49+
4850
def runSonarScan(String javaVersion){
4951
sh label:'test', script: '''#!/bin/bash
5052
export JAVA_HOME=$'''+javaVersion+'''
@@ -54,20 +56,25 @@ def runSonarScan(String javaVersion){
5456
./gradlew sonar -Dsonar.projectKey='ML-DevExp-marklogic-flux' -Dsonar.projectName='ML-DevExp-marklogic-flux' || true
5557
'''
5658
}
59+
5760
pipeline{
5861
agent none
62+
5963
options {
6064
checkoutToSubdirectory 'flux'
6165
buildDiscarder logRotator(artifactDaysToKeepStr: '7', artifactNumToKeepStr: '', daysToKeepStr: '30', numToKeepStr: '')
6266
}
67+
6368
environment{
6469
JAVA_HOME_DIR="/home/builder/java/jdk-11.0.2"
6570
JAVA17_HOME_DIR="/home/builder/java/jdk-17.0.2"
6671
GRADLE_DIR =".gradle"
6772
DMC_USER = credentials('MLBUILD_USER')
6873
DMC_PASSWORD = credentials('MLBUILD_PASSWORD')
6974
}
75+
7076
stages{
77+
7178
stage('tests'){
7279
environment{
7380
scannerHome = tool 'SONAR_Progress'
@@ -85,6 +92,25 @@ pipeline{
8592
}
8693
}
8794
}
95+
96+
stage('publishApi'){
97+
agent {label 'devExpLinuxPool'}
98+
when {
99+
branch 'develop'
100+
}
101+
steps{
102+
sh label:'publishApi', script: '''#!/bin/bash
103+
export JAVA_HOME=`eval echo "$JAVA_HOME_DIR"`;
104+
export GRADLE_USER_HOME=$WORKSPACE/$GRADLE_DIR
105+
export PATH=$JAVA_HOME/bin:$GRADLE_USER_HOME:$PATH;
106+
./gradlew clean;
107+
cp ~/.gradle/gradle.properties $GRADLE_USER_HOME/gradle.properties;
108+
cd $WORKSPACE/flux;
109+
./gradlew publish
110+
'''
111+
}
112+
}
113+
88114
stage('publish'){
89115
agent{ label 'devExpLinuxPool'}
90116
when {
@@ -116,6 +142,7 @@ pipeline{
116142
}
117143
}
118144
}
145+
119146
stage('regressions'){
120147
when{
121148
allOf{
@@ -136,5 +163,6 @@ pipeline{
136163
}
137164
}
138165
}
166+
139167
}
140168
}

NOTICE.txt

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,18 @@ hadoop-aws 3.3.4 (Apache-2.0)
1111
hadoop-client 3.3.4 (Apache-2.0)
1212
marklogic-spark-connector 2.5.1 (Apache-2.0)
1313
picocli 4.7.6 (Apache-2.0)
14-
spark-avro_2.12 3.5.3 (Apache-2.0)
15-
spark-sql_2.12 3.5.3 (Apache-2.0)
14+
spark-avro_2.12 3.5.5 (Apache-2.0)
15+
spark-sql_2.12 3.5.5 (Apache-2.0)
16+
tika-parser-microsoft-module 3.1.0 (Apache-2.0)
17+
tika-parser-pdf-module 3.1.0 (Apache-2.0)
1618

1719
Common Licenses
1820

1921
Apache License 2.0 (Apache-2.0)
2022

2123
Third-Party Components
2224

23-
The following is a list of the third-party components used by MarkLogic® Flux™ 1.2.1 (last updated January 7, 2025):
25+
The following is a list of the third-party components used by MarkLogic® Flux™ 1.3.0 (last updated May 1, 2025):
2426

2527
aws-java-sdk-s3 1.12.262 (Apache-2.0)
2628
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-s3
@@ -34,26 +36,33 @@ hadoop-client 3.3.4 (Apache-2.0)
3436
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client
3537
For the full text of the Apache-2.0 license, see Apache License 2.0 (Apache-2.0)
3638

37-
marklogic-spark-connector 2.5.1 (Apache-2.0)
39+
marklogic-spark-connector 2.6.0 (Apache-2.0)
3840
https://repo1.maven.org/maven2/com/marklogic/marklogic-spark-connector
3941
For the full text of the Apache-2.0 license, see Apache License 2.0 (Apache-2.0)
4042

4143
picocli 4.7.6 (Apache-2.0)
4244
https://repo1.maven.org/maven2/info/picocli/picocli
4345
For the full text of the Apache-2.0 license, see Apache License 2.0 (Apache-2.0)
4446

45-
spark-avro_2.12 3.5.3 (Apache-2.0)
47+
spark-avro_2.12 3.5.5 (Apache-2.0)
4648
https://repo1.maven.org/maven2/org/apache/spark/spark-avro_2.12
4749
For the full text of the Apache-2.0 license, see Apache License 2.0 (Apache-2.0)
4850

49-
spark-sql_2.12 3.5.3 (Apache-2.0)
51+
spark-sql_2.12 3.5.5 (Apache-2.0)
5052
https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.12
5153
For the full text of the Apache-2.0 license, see Apache License 2.0 (Apache-2.0)
5254

55+
tika-parser-microsoft-module 3.1.0 (Apache-2.0)
56+
https://repo1.maven.org/maven2/org/apache/tika/tika-parser-microsoft-module/
57+
For the full text of the Apache-2.0 license, see Apache License 2.0 (Apache-2.0)
58+
59+
tika-parser-pdf-module 3.1.0 (Apache-2.0)
60+
https://repo1.maven.org/maven2/org/apache/tika/tika-parser-pdf-module/
61+
For the full text of the Apache-2.0 license, see Apache License 2.0 (Apache-2.0)
5362

5463
Common Licenses
5564

56-
This section shows the text of common third-party licenses used by MarkLogic® Flux™ 1.2.1 (last updated January 7, 2025):
65+
This section shows the text of common third-party licenses used by MarkLogic® Flux™ 1.3.0 (last updated January 7, 2025):
5766

5867
Apache License 2.0 (Apache-2.0)
5968
https://spdx.org/licenses/Apache-2.0.html

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ With Flux, you can automate common data movement use cases including:
66

77
- Importing rows from an RDBMS.
88
- Importing JSON, XML, CSV, Parquet and other file types from a local filesystem or S3.
9+
- Extract text from binary documents and classify it using [Progress Semaphore](https://www.progress.com/semaphore).
910
- Implementing a data pipeline for a [RAG solution with MarkLogic](https://www.progress.com/marklogic/solutions/generative-ai).
1011
- Copying data from one MarkLogic database to another database.
1112
- Reprocessing data in MarkLogic via custom code.

build.gradle

Lines changed: 79 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,40 @@
1+
plugins {
2+
id "org.sonarqube" version "6.1.0.5360"
3+
}
4+
5+
sonar {
6+
properties {
7+
property "sonar.projectKey", "flux"
8+
property "sonar.host.url", "http://localhost:9000"
9+
property "sonar.coverage.jacoco.xmlReportPaths", "code-coverage-report/build/reports/jacoco/testCodeCoverageReport/testCodeCoverageReport.xml"
10+
// Avoids a warning from Gradle.
11+
property "sonar.gradle.skipCompile", "true"
12+
}
13+
}
14+
115
subprojects {
216
apply plugin: "java-library"
317

418
group = "com.marklogic"
519

620
java {
7-
sourceCompatibility = 11
8-
targetCompatibility = 11
21+
// Flux requires Java 11 for all operations besides splitting and embedding, which require Java 17 due to
22+
// the requirements of the langchain4j dependency.
23+
toolchain {
24+
languageVersion = JavaLanguageVersion.of(11)
25+
}
926
}
1027

28+
// Allows for quickly identifying compiler warnings.
29+
tasks.withType(JavaCompile) {
30+
options.compilerArgs << '-Xlint:unchecked'
31+
options.deprecation = true
32+
}
33+
34+
javadoc.failOnError = false
35+
// Ignores warnings on params that don't have descriptions, which is a little too noisy
36+
javadoc.options.addStringOption('Xdoclint:none', '-quiet')
37+
1138
repositories {
1239
mavenCentral()
1340
mavenLocal()
@@ -22,7 +49,42 @@ subprojects {
2249
details.useVersion '2.15.2'
2350
details.because 'Need to match the version used by Spark.'
2451
}
52+
if (details.requested.group.equals("org.slf4j")) {
53+
details.useVersion "2.0.16"
54+
details.because "Ensures that slf4j-api 1.x does not appear on the Flux classpath in particular, which can " +
55+
"lead to this issue - https://www.slf4j.org/codes.html#StaticLoggerBinder."
56+
}
57+
if (details.requested.group.equals("org.apache.logging.log4j")) {
58+
details.useVersion "2.24.3"
59+
details.because "Need to match the version used by Apache Tika. Spark uses 2.20.0 but automated tests confirm " +
60+
"that Spark seems fine with 2.24.3."
61+
}
2562
}
63+
64+
resolutionStrategy {
65+
// By default, Spark 3.5.x does not include the log4j 1.x dependency via its zookeeper dependency. But somehow, by
66+
// adding hadoop-client 3.3.4 to the mix, the log4j 1.x dependency comes via the zookeeper 3.6.3 dependency. Per
67+
// the release notes at https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html, using zookeeper 3.6.4 - which
68+
// removes log4j 1.x, thus avoiding the major CVE associated with log4j 1.x - appears safe, which is confirmed by
69+
// tests as well.
70+
force "org.apache.zookeeper:zookeeper:3.6.4"
71+
72+
// Avoids a classpath conflict between Spark and tika-parser-microsoft-module. Forces Spark to use the
73+
// version that tika-parser-microsoft-module wants.
74+
// Avoids another classpath conflict between Spark and tika-parser-microsoft-module.
75+
force "org.apache.commons:commons-compress:1.27.1"
76+
}
77+
78+
// Without this exclusion, we have multiple slf4j providers, leading to an ugly warning at the start
79+
// of each Flux execution.
80+
exclude group: "org.slf4j", module: "slf4j-reload4j"
81+
82+
// The rocksdbjni dependency weighs in at 50mb and so far does not appear necessary for our use of Spark.
83+
exclude module: "rocksdbjni"
84+
}
85+
86+
task allDeps(type: DependencyReportTask) {
87+
description = "Allows for generating dependency reports for every subproject in a single task."
2688
}
2789

2890
test {
@@ -31,6 +93,20 @@ subprojects {
3193
events 'started', 'passed', 'skipped', 'failed'
3294
exceptionFormat 'full'
3395
}
96+
jvmArgs = [
97+
// Needed for all Java 17 testing.
98+
"--add-opens", "java.base/sun.nio.ch=ALL-UNNAMED",
99+
100+
// For Spark's SerializationDebugger when using Java 17. See ReprocessTest for one example of why this is needed.
101+
"--add-opens", "java.base/sun.security.action=ALL-UNNAMED",
102+
103+
// Needed by the JDBC tests.
104+
"--add-opens", "java.base/sun.util.calendar=ALL-UNNAMED",
105+
106+
// Needed by CustomImportTest
107+
"--add-opens", "java.base/java.io=ALL-UNNAMED",
108+
"--add-opens", "java.base/sun.nio.cs=ALL-UNNAMED"
109+
]
34110
}
35111
}
36112

@@ -39,6 +115,7 @@ task gettingStartedZip(type: Zip) {
39115
"on the GitHub release page."
40116
from "examples/getting-started"
41117
exclude "build", ".gradle", "gradle-*.properties", "flux", ".gitignore", "marklogic-flux"
118+
exclude "src/main/ml-schemas/tde/chunks.json"
42119
into "marklogic-flux-getting-started-${version}"
43120
archiveFileName = "marklogic-flux-getting-started-${version}.zip"
44121
destinationDirectory = file("build")

code-coverage-report/build.gradle

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
// See https://docs.gradle.org/current/samples/sample_jvm_multi_project_with_code_coverage_standalone.html
2+
// for more information on how this file was created.
3+
4+
plugins {
5+
id 'jacoco-report-aggregation'
6+
}
7+
8+
dependencies {
9+
jacocoAggregation project(':flux-embedding-model-azure-open-ai')
10+
jacocoAggregation project(':flux-embedding-model-minilm')
11+
jacocoAggregation project(':flux-embedding-model-ollama')
12+
jacocoAggregation project(':flux-tests-api')
13+
jacocoAggregation project(':flux-cli')
14+
jacocoAggregation project(':flux-java17-tests')
15+
}
16+
17+
reporting {
18+
reports {
19+
testCodeCoverageReport(JacocoCoverageReport) {
20+
testType = TestSuiteType.UNIT_TEST
21+
}
22+
}
23+
}

0 commit comments

Comments
 (0)