Skip to content

Commit e360f74

Browse files
committed
bundle jar
1 parent f9b99b7 commit e360f74

File tree

11 files changed

+1149
-0
lines changed

11 files changed

+1149
-0
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: connector-build
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
paths:
8+
- 'neo4j-unity-catalog-connector/**'
9+
pull_request:
10+
paths:
11+
- 'neo4j-unity-catalog-connector/**'
12+
13+
jobs:
14+
build:
15+
runs-on: ubuntu-latest
16+
defaults:
17+
run:
18+
working-directory: neo4j-unity-catalog-connector
19+
steps:
20+
- uses: actions/checkout@v4
21+
- name: Set up JDK
22+
uses: actions/setup-java@v4
23+
with:
24+
distribution: zulu
25+
java-version: 17
26+
- name: Cache Maven packages
27+
uses: actions/cache@v4
28+
with:
29+
path: ~/.m2
30+
key: ${{ runner.os }}-m2-${{ hashFiles('neo4j-unity-catalog-connector/pom.xml') }}-${{ github.sha }}
31+
- name: Clean and verify
32+
run: ./mvnw -q clean verify
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: connector-release
2+
3+
on:
4+
create:
5+
tags:
6+
- 'connector-*'
7+
8+
jobs:
9+
build_artifact:
10+
if: (github.event_name == 'create' && github.event.ref_type == 'tag')
11+
runs-on: ubuntu-latest
12+
defaults:
13+
run:
14+
working-directory: neo4j-unity-catalog-connector
15+
steps:
16+
- uses: actions/checkout@v4
17+
- name: Set up JDK
18+
uses: actions/setup-java@v4
19+
with:
20+
distribution: zulu
21+
java-version: 17
22+
- name: Cache Maven packages
23+
uses: actions/cache@v4
24+
with:
25+
path: ~/.m2
26+
key: ${{ runner.os }}-m2-${{ hashFiles('neo4j-unity-catalog-connector/pom.xml') }}-${{ github.sha }}
27+
- name: Clean and verify
28+
run: ./mvnw -q clean verify
29+
- name: Release
30+
uses: softprops/action-gh-release@v2
31+
with:
32+
files: neo4j-unity-catalog-connector/target/neo4j-unity-catalog-connector-*.jar

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ target/
4040
*.jks
4141
*.hprof
4242
dependency-reduced-pom.xml
43+
# Maven wrapper JAR must be committed
44+
!**/.mvn/wrapper/maven-wrapper.jar
4345
*.DS_Store
4446
*.swp
4547
*.swo
58.5 KB
Binary file not shown.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# https://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
distributionUrl=https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/3.9.6/apache-maven-3.9.6-bin.zip
18+
wrapperUrl=https://repo.maven.apache.org/maven2/org/apache/maven/wrapper/maven-wrapper/3.1.1/maven-wrapper-3.1.1.jar
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# Neo4j Unity Catalog Connector
2+
3+
A single shaded (fat) JAR that bundles the Neo4j JDBC driver, the SQL-to-Cypher translator, and the Spark subquery cleaner for use with Databricks Unity Catalog federated queries.
4+
5+
Instead of downloading and uploading two separate JARs (`neo4j-jdbc-full-bundle` + `neo4j-jdbc-translator-sparkcleaner`), users upload this single JAR to a UC Volume and reference one path in their connection configuration.
6+
7+
## Prerequisites
8+
9+
- Java 17+
10+
11+
## Build
12+
13+
```bash
14+
cd neo4j-unity-catalog-connector
15+
./mvnw clean verify
16+
```
17+
18+
The shaded JAR is produced at:
19+
20+
```
21+
target/neo4j-unity-catalog-connector-1.0.0-SNAPSHOT.jar
22+
```
23+
24+
## Run Tests
25+
26+
Tests verify that the bundled translators are discoverable via SPI, the Spark subquery cleaner handles Databricks/Spark query patterns, and the JDBC driver class is loadable.
27+
28+
```bash
29+
./mvnw test
30+
```
31+
32+
## Test in Databricks
33+
34+
1. Build the JAR (see above).
35+
36+
2. Upload to a Unity Catalog Volume:
37+
38+
```python
39+
# In a Databricks notebook
40+
dbutils.fs.cp(
41+
"file:/path/to/neo4j-unity-catalog-connector-1.0.0-SNAPSHOT.jar",
42+
"/Volumes/<catalog>/<schema>/jars/neo4j-unity-catalog-connector-1.0.0-SNAPSHOT.jar"
43+
)
44+
```
45+
46+
3. Create a JDBC connection referencing the single JAR:
47+
48+
```sql
49+
CREATE CONNECTION neo4j_connection TYPE JDBC
50+
ENVIRONMENT (
51+
java_dependencies '["/Volumes/<catalog>/<schema>/jars/neo4j-unity-catalog-connector-1.0.0-SNAPSHOT.jar"]'
52+
safespark_memory '800m'
53+
)
54+
OPTIONS (
55+
host '<neo4j-host>',
56+
port '7687',
57+
user '<username>',
58+
password '<password>',
59+
jdbc_driver 'org.neo4j.jdbc.Neo4jDriver',
60+
jdbc_url 'jdbc:neo4j://<neo4j-host>:7687?database=neo4j&enableSQLTranslation=true'
61+
)
62+
```
63+
64+
4. Run a federated query:
65+
66+
```sql
67+
SELECT * FROM IDENTIFIER(neo4j_connection.`/`) LIMIT 10;
68+
```
69+
70+
## What's Inside
71+
72+
The shaded JAR bundles:
73+
74+
| Dependency | Purpose |
75+
|---|---|
76+
| `neo4j-jdbc` | Core JDBC driver for Neo4j |
77+
| `neo4j-jdbc-translator-impl` | SQL-to-Cypher translation engine |
78+
| `neo4j-jdbc-translator-sparkcleaner` | Cleans Spark subquery wrapping (`SPARK_GEN_SUBQ_0 WHERE 1=0`) |
79+
80+
All transitive dependencies (Jackson, Netty, jOOQ, Bolt protocol, Cypher DSL, Reactive Streams) are relocated under `org.neo4j.jdbc.internal.shaded.*` to avoid classpath conflicts with the Databricks runtime.
81+
82+
## Design
83+
84+
### Problem
85+
86+
Connecting Neo4j to Databricks Unity Catalog previously required users to download and upload **two separate JARs** to a Unity Catalog Volume:
87+
88+
1. `neo4j-jdbc-full-bundle-6.x.x.jar` — the main JDBC driver with SQL-to-Cypher translation
89+
2. `neo4j-jdbc-translator-sparkcleaner-6.x.x.jar` — handles Spark's subquery wrapping (`SPARK_GEN_SUBQ_0 WHERE 1=0`)
90+
91+
Both had to be referenced individually in the `java_dependencies` array when creating a UC JDBC connection. This meant two manual downloads from Maven Central, two uploads to a Volume, two paths to manage, and two version numbers to keep in sync. If a user forgot the sparkcleaner JAR or used mismatched versions, the connection silently broke with confusing errors.
92+
93+
### Precedent: The AWS Glue Project
94+
95+
The `neo4j-aws-glue` project already does exactly this for AWS Glue. It is a small Maven project that:
96+
97+
- Depends on `neo4j-jdbc`, `neo4j-jdbc-translator-impl`, and `neo4j-jdbc-translator-sparkcleaner`
98+
- Adds its own custom translator (`AwsGlueTranslator`) that rewrites `WHERE 1=0` to `LIMIT 1` for Glue's schema probing behavior
99+
- Uses `maven-shade-plugin` to merge everything into a single JAR with relocated packages (Jackson, Netty, jOOQ, Bolt, Cypher DSL, etc.) under `org.neo4j.jdbc.internal.shaded.*` to avoid classpath conflicts
100+
- Registers the custom translator via Java SPI (`META-INF/services/org.neo4j.jdbc.translator.spi.TranslatorFactory`)
101+
- Produces a self-contained JAR that users drop into AWS Glue with zero additional setup
102+
103+
This project follows the same pattern.
104+
105+
### Custom Databricks Translator
106+
107+
The AWS Glue project has a custom `AwsGlueTranslator` because AWS Glue sends its own `WHERE 1=0` pattern for schema probing that differs from Spark's. Databricks uses standard Spark through SafeSpark, so the existing `neo4j-jdbc-translator-sparkcleaner` handles the subquery wrapping without any additional custom translator.
108+
109+
If testing reveals Databricks-specific SQL patterns that the existing translators don't handle (for example, SafeSpark may introduce its own query wrapping beyond what standard Spark does), a custom `DatabricksTranslator` can be added later following the same SPI pattern. The project structure accommodates this possibility even if the current version ships without one.
110+
111+
### SPI Service Registration
112+
113+
Unlike the AWS Glue project, there is no custom translator factory to register via SPI. The bundled `neo4j-jdbc-translator-sparkcleaner` and `neo4j-jdbc-translator-impl` JARs each include their own `META-INF/services/org.neo4j.jdbc.translator.spi.TranslatorFactory` files. The `ServicesResourceTransformer` in the maven-shade-plugin automatically merges these SPI registrations into the shaded JAR, so no custom services file is needed. If a `DatabricksTranslator` is added later, its factory would be registered via a new services file at that point.
114+
115+
### User-Agent Identification
116+
117+
The project includes a `META-INF/neo4j-jdbc-user-agent.txt` file containing:
118+
119+
```
120+
neo4j-unity-catalog-connector/${project.version}
121+
```
122+
123+
This string is sent by the Neo4j JDBC driver to the Neo4j server with every connection. The `${project.version}` placeholder is substituted by Maven at build time (via `<filtering>true</filtering>` in the pom.xml). This lets Neo4j (especially Aura) distinguish connections coming from the Databricks UC connector vs the plain JDBC driver vs the Glue connector — useful for support, usage analytics, and debugging.
124+
125+
### Package Relocation
126+
127+
All bundled dependencies are relocated to avoid conflicts with whatever JARs are already on the Databricks SafeSpark sandbox classpath. The relocation scheme from the AWS Glue project (`org.neo4j.jdbc.internal.shaded.*`) is reused as-is since it was designed by the Neo4j Connectors team for exactly this purpose.
128+
129+
### Impact on the User Experience
130+
131+
**Before (two JARs):**
132+
```sql
133+
CREATE CONNECTION neo4j_connection TYPE JDBC
134+
ENVIRONMENT (
135+
java_dependencies '[
136+
"/Volumes/catalog/schema/jars/neo4j-jdbc-full-bundle-6.10.5.jar",
137+
"/Volumes/catalog/schema/jars/neo4j-jdbc-translator-sparkcleaner-6.10.5.jar"
138+
]'
139+
)
140+
OPTIONS (...)
141+
```
142+
143+
**After (one JAR):**
144+
```sql
145+
CREATE CONNECTION neo4j_connection TYPE JDBC
146+
ENVIRONMENT (
147+
java_dependencies '["/Volumes/catalog/schema/jars/neo4j-unity-catalog-connector-1.0.0.jar"]'
148+
)
149+
OPTIONS (...)
150+
```
151+
152+
### Decisions
153+
154+
1. **Repo location:** Subdirectory within `neo4j-uc-integration` (`neo4j-unity-catalog-connector/`).
155+
156+
2. **Artifact naming:** `neo4j-unity-catalog-connector` (groupId: `org.neo4j`, artifactId: `neo4j-unity-catalog-connector`).
157+
158+
3. **Version alignment:** Independent versioning (starting at `1.0.0-SNAPSHOT`), with the upstream `neo4j-jdbc` dependency version pinned separately (initially `6.10.5`).
159+
160+
4. **Custom translator:** Not needed initially. The existing `sparkcleaner` translator handles Databricks/Spark subquery wrapping. If testing reveals Databricks-specific SQL patterns, a `DatabricksTranslator` can be added following the `AwsGlueTranslator` SPI pattern.
161+
162+
### Implementation Progress
163+
164+
#### Phase 1: Create the Maven Project — COMPLETE
165+
166+
Built and verified locally. The `neo4j-unity-catalog-connector/` subdirectory contains:
167+
168+
```
169+
neo4j-unity-catalog-connector/
170+
├── .mvn/wrapper/
171+
│ ├── maven-wrapper.jar
172+
│ └── maven-wrapper.properties
173+
├── src/
174+
│ ├── main/resources/META-INF/
175+
│ │ └── neo4j-jdbc-user-agent.txt
176+
│ └── test/java/org/neo4j/uc/
177+
│ └── BundledTranslatorsTest.java
178+
├── mvnw
179+
├── mvnw.cmd
180+
├── pom.xml
181+
└── README.md
182+
```
183+
184+
**Build verification:**
185+
- `./mvnw clean verify` succeeds (6 tests pass)
186+
- Produces `neo4j-unity-catalog-connector-1.0.0-SNAPSHOT.jar` (11MB)
187+
- User-agent in JAR: `neo4j-unity-catalog-connector/1.0.0-SNAPSHOT`
188+
- SPI services merged: `SqlToCypherTranslatorFactory` + `SparkSubqueryCleaningTranslatorFactory`
189+
- 5952 classes relocated under `org.neo4j.jdbc.internal.shaded.*`
190+
191+
**Unit tests (`BundledTranslatorsTest`):**
192+
- SPI discovery: verifies both `SqlToCypherTranslatorFactory` and `SparkSubqueryCleaningTranslatorFactory` are found via `ServiceLoader`
193+
- Factory creation: verifies all discovered factories produce non-null `Translator` instances
194+
- Pipeline integration: verifies the full translator pipeline (spark cleaner + SQL-to-Cypher) processes Spark-wrapped queries without error and removes `SPARK_GEN_SUBQ` wrapping
195+
- Spark cleaner pass-through: verifies the cleaner handles plain Cypher without throwing
196+
- JDBC driver loading: verifies `org.neo4j.jdbc.Neo4jDriver` is on the classpath
197+
198+
**CI/CD workflows added:**
199+
- `.github/workflows/connector-build.yml` — builds on push to `main` and PRs, scoped to `neo4j-unity-catalog-connector/` path changes
200+
- `.github/workflows/connector-release.yml` — publishes GitHub Release on `connector-*` tags
201+
202+
#### Phase 2: Validate with Databricks — NOT STARTED
203+
204+
#### Phase 3: Update Documentation — NOT STARTED
205+
206+
#### Phase 4: CI/CD and Release — PARTIAL
207+
- GitHub Actions workflows created (build + release)
208+
- Dependabot configuration not yet added
209+
- Maven Central publishing decision deferred

0 commit comments

Comments
 (0)