fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency by harshach · Pull Request #28117 · open-metadata/OpenMetadata

harshach · 2026-05-14T14:31:35Z

Describe your changes:

RDF Knowledge Graph indexing was duplicating triples and accumulating disk + memory in Fuseki on every run; when Fuseki crash-looped, every entity-write hook blocked synchronously on the unreachable server (no HTTP timeout, 3-retry loop), saturating the bounded AsyncService pool and pushing login latency to ~45 s. The reindex now uses recreateIndex=true on a weekly Saturday cadence, every reconciliation path actually deletes removed relationships, and the Fuseki client has a 2 s connect timeout + circuit breaker so a dead Fuseki can no longer block request threads.

Type of change:

Bug fix

High-level design:

Storage-side (stop growth):

RdfRepository.createOrUpdate no longer preserves stale relationship triples — the translator is the source of truth and surrounding orchestration rewrites the current set. Also removes a wasted CONSTRUCT round-trip per write.
bulkStoreRelationships does per-source-entity DELETE WHERE with a predicate-exclusion FILTER for lineage edges, so removed relationships actually leave the store.
RdfRepository.clearAllGlossaryTermRelations() is now wired into RdfIndexApp.initializeJob (the method existed but had no callers).
recreateIndex default flipped to true, cron moved to "0 0 * * 6" (Saturday midnight), and reloadOntologies() runs after clearAll() so the ontology graph isn't left empty.
Added 2.0.1/{mysql,postgres}/postDataMigrationSQLScript.sql to update existing installed_apps rows; the app loader is insert-only on upgrade.

Connectivity / concurrency (isolate platform from Fuseki health):

JenaFusekiStorage HttpClients now use connectTimeout=2s; on ConnectException / ClosedChannelException / HttpConnectTimeoutException we fast-fail instead of retrying. A 5-failure/30 s circuit breaker short-circuits subsequent calls until Fuseki recovers (probed via testConnection which bypasses the breaker).
RdfUpdater mutators now go through AsyncService.execute(...) (the existing virtual-thread pool) with a bounded pendingWrites gate (cap 1000, drop-on-overflow with logged warning) so the request thread returns immediately and a dead Fuseki cannot starve AsyncService permits.

Tests:

Unit tests

Extended RdfIndexAppTest:

Existing recreateIndex=true test now also verifies reloadOntologies() is called after clearAll().
New: clearAllGlossaryTermRelations() is invoked when glossaryTerm is in the entity set AND recreateIndex=false.
New: it is NOT invoked when glossaryTerm is absent.

Backend integration tests

Not applicable in this PR — the branch currently has a pre-existing es.co.elastic.clients.* shading compile issue unrelated to this work that blocks the module build. Once that is fixed, the planned end-to-end tests (re-run indexer twice → triple count unchanged; remove an edge in MySQL → triple disappears in Fuseki; point RDF endpoint at a closed port → write returns <500ms; recreateIndex=true → ontology graph non-empty after run) should be added.

Manual testing performed

Verified diff is syntactically consistent with existing patterns (DELETE/INSERT scaffolding, Entity.GLOSSARY_TERM constant, AsyncService API).
Reviewed git stash baseline to confirm pre-existing compile errors are unrelated to these changes.

UI screen recording / screenshots:

Not applicable.

Checklist:

I have read the CONTRIBUTING document.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
I have added tests around the new logic.

🤖 Generated with Claude Code

…tency RdfIndexApp ran daily and never reconciled removed relationships, so triples grew unboundedly across runs. When Fuseki crash-looped on the resulting disk pressure, every entity-write hook blocked synchronously on the unreachable server (no HTTP connect timeout, 3-retry loop on ConnectException), saturating the bounded AsyncService pool and pushing login to ~45s. Storage-side fixes (stop growth): - Drop the extractRelationshipTriples "preserve forward" path in RdfRepository.createOrUpdate; the translator is the source of truth and the surrounding orchestration already rewrites the current relationship set. This also removes a wasted CONSTRUCT round-trip per entity write. - bulkStoreRelationships now does per-source-entity DELETE WHERE with a predicate-exclusion FILTER for lineage edges, so relationships that no longer exist actually leave the store. - Wire RdfRepository.clearAllGlossaryTermRelations() into RdfIndexApp's initializeJob (the method existed but had no callers). - Flip recreateIndex default to true and move the cron to Saturday midnight ("0 0 * * 6"). Add reloadOntologies() so CLEAR ALL doesn't leave the ontology graph empty before indexing starts. - Include a 2.0.1 post-data migration that updates existing installed_apps rows; the app loader is insert-only on upgrade. Connectivity / concurrency fixes (isolate API latency from Fuseki health): - Add 2s connectTimeout to every JenaFusekiStorage HttpClient and fast-fail on ConnectException / ClosedChannelException / HttpConnectTimeoutException instead of retrying. Introduce a 5-failure/30s circuit breaker. - Route all RdfUpdater mutators through AsyncService.execute with a bounded pendingWrites gate (cap 1000, drop-on-overflow with logged warning) so a dead Fuseki can no longer block request threads or starve the AsyncService pool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-05-14T14:35:45Z

+          .append("entity/\") && ?p != <https://open-metadata.org/ontology/UPSTREAM> && ?p")
+          .append(
+              " != <http://www.w3.org/ns/prov#wasDerivedFrom> && ?p != <https://open-metadata.org/ontology/hasLineageDetails>) } }");


⚠️ Bug: Hardcoded predicate URIs in DELETE filter ignore configurable baseUri

In bulkStoreRelationships, the SPARQL DELETE WHERE filter hardcodes https://open-metadata.org/ontology/UPSTREAM and https://open-metadata.org/ontology/hasLineageDetails to exclude lineage predicates from deletion. However, the INSERT uses the configurable baseUri field for the ontology prefix (PREFIX om: <baseUri + "ontology/">). If baseUri is configured to anything other than https://open-metadata.org/, the predicates stored in the graph won't match the hardcoded exclusion URIs, causing lineage edges to be incorrectly deleted on every reconciliation run.

The storeRelationship method (single-relationship path) also hardcodes the same ontology URI pattern, so this is consistent within the file — but both paths are broken for non-default baseUri configurations.

Use baseUri for ontology predicates in the DELETE filter to match the INSERT path, keeping only the W3C prov URI hardcoded (since it's a well-known external vocabulary).:

deleteUpdate .append("DELETE { GRAPH <") .append(KNOWLEDGE_GRAPH) .append("> { <") .append(sourceUri) .append("> ?p ?o } } WHERE { GRAPH <") .append(KNOWLEDGE_GRAPH) .append("> { <") .append(sourceUri) .append("> ?p ?o . FILTER(isIRI(?o) && STRSTARTS(STR(?o), "") .append(baseUri) .append("entity/") && ?p != <") .append(baseUri) .append("ontology/UPSTREAM> && ?p") .append(" != <http://www.w3.org/ns/prov#wasDerivedFrom> && ?p != <") .append(baseUri) .append("ontology/hasLineageDetails>) } }");

Apply fix

_{Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎}

Copilot

Pull request overview

This PR aims to make RDF/Fuseki indexing converge more reliably and reduce platform latency impact when Fuseki is unhealthy.

Changes:

Changes RDF app defaults to weekly recreate-index runs and adds migrations for existing app rows.
Adds Fuseki connection timeout/circuit-breaker handling and async RDF updater submission.
Adjusts RDF reindex cleanup paths, ontology reload after clear, and related unit tests.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/rdf/RdfIndexAppTest.java`	Adds coverage for ontology reload and glossary relation cleanup behavior.
`openmetadata-service/src/main/resources/json/data/appMarketPlaceDefinition/RdfIndexApp.json`	Updates marketplace default `recreateIndex` to `true`.
`openmetadata-service/src/main/resources/json/data/app/RdfIndexApp.json`	Updates app default `recreateIndex` and weekly cron schedule.
`openmetadata-service/src/main/java/org/openmetadata/service/rdf/storage/JenaFusekiStorage.java`	Adds timeout/circuit-breaker state and relationship reconciliation changes.
`openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfUpdater.java`	Moves RDF mutating hooks to bounded async submission.
`openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java`	Adds ontology reload and removes relationship preservation during entity writes.
`openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/rdf/RdfIndexApp.java`	Wires glossary relation cleanup and ontology reload after full RDF clear.
`bootstrap/sql/migrations/native/2.0.1/postgres/postDataMigrationSQLScript.sql`	Migrates existing PostgreSQL app rows to new RDF app defaults.
`bootstrap/sql/migrations/native/2.0.1/mysql/postDataMigrationSQLScript.sql`	Migrates existing MySQL app rows to new RDF app defaults.

      Model rdfModel = translator.toRdf(entity);
-
-      // Preserve existing relationship triples before updating
-      // This prevents postCreate() from overwriting relationships added by storeRelationships()
-      Model existingModel = storageService.getEntity(entityType, entity.getId());
-      if (existingModel != null && !existingModel.isEmpty()) {
-        String entityUri =
-            config.getBaseUri().toString() + "entity/" + entityType + "/" + entity.getId();
-        // Extract and preserve relationship triples (where entity is subject and object is a URI)
-        Model relationshipTriples = extractRelationshipTriples(existingModel, entityUri);
-        if (!relationshipTriples.isEmpty()) {
-          rdfModel.add(relationshipTriples);
-          LOG.debug(
-              "Preserved {} relationship triples for entity {}",
-              relationshipTriples.size(),
-              entity.getId());
-        }
-      }
-
      storageService.storeEntity(entityType, entity.getId(), rdfModel);


+    submitAsync(
+        "updateEntity " + entity.getId(),
+        () -> {
+          Timer.Sample sample = RequestLatencyContext.startRdfOperation();
+          try {
+            rdfRepository.createOrUpdate(entity);


+    submitAsync(
+        "removeRelationship",
+        () -> {
+          Timer.Sample sample = RequestLatencyContext.startRdfOperation();
+          try {
+            rdfRepository.removeRelationship(relationship);
+          } catch (Exception e) {


@@ -371,32 +458,46 @@ public void bulkStoreRelationships(List<RelationshipData> relationships) {
    if (relationships.isEmpty()) {
      return;


+        try {
+          UpdateRequest deleteRequest = UpdateFactory.create(deleteUpdate.toString());
+          connection.update(deleteRequest);
+        } catch (Exception e) {
+          if (isConnectError(e)) {
+            recordFailure();
+            throw new RuntimeException(
+                "Failed to bulk store relationships in RDF (Fuseki unreachable)", e);
+          }
+          // Tolerate non-connect delete errors — the source entities may not
+          // have any prior outgoing edges yet (first-time indexing).
+          LOG.debug("Per-source delete completed (some sources may not have had prior edges)");
+        }


+    // bulkAddGlossaryTermRelations has no per-batch DELETE side, so stale
+    // glossary-term relations would accumulate forever across reindex runs.
+    // When recreateIndex=true clearAll() already wipes everything, so we
+    // only need this targeted cleanup on incremental runs.
+    if (!Boolean.TRUE.equals(jobData.getRecreateIndex())
+        && jobData.getEntities() != null
+        && jobData.getEntities().contains(Entity.GLOSSARY_TERM)) {
+      LOG.info("Clearing existing glossary term relations before re-indexing");
+      try {
+        rdfRepository.clearAllGlossaryTermRelations();
+      } catch (Exception e) {
+        LOG.warn("Failed to clear glossary term relations; continuing with reindex", e);
+      }


      java.net.http.HttpClient httpClient =
          java.net.http.HttpClient.newBuilder()
+              .connectTimeout(CONNECT_TIMEOUT)


+      java.net.http.HttpClient httpClient =
+          java.net.http.HttpClient.newBuilder().connectTimeout(CONNECT_TIMEOUT).build();
+      this.connection =
+          RDFConnectionFuseki.create().destination(endpoint).httpClient(httpClient).build();


      verify(mockRdfRepository).clearAll();
+      // CLEAR ALL wipes ontology/shapes graphs; clearRdfData() must reload them
+      // so post-wipe SPARQL queries that rely on the ontology keep working.
+      verify(mockRdfRepository).reloadOntologies();


+      try {
+        rdfRepository.clearAllGlossaryTermRelations();
+      } catch (Exception e) {
+        LOG.warn("Failed to clear glossary term relations; continuing with reindex", e);
+      }


…surface ontology failures PR #28117 review feedback. Addresses 13 findings across gitar-bot and Copilot: Storage correctness: - JenaFusekiStorage.storeEntity now keeps URI-valued triples (relationships) and only refreshes literal-valued triples. A metadata-only PATCH would otherwise wipe every inter-entity edge until the next weekly recreate-index, and async ordering between updateEntity and addRelationship could leave the graph missing edges (Copilot #1, #2). - RdfRepository.removeRelationship wraps the DELETE in the knowledge named graph and uses getRelationshipPredicate so the predicate URI matches what addRelationship actually wrote (e.g. UPSTREAM → prov:wasDerivedFrom). The previous bare DELETE in the default graph was a silent no-op (Copilot #3). - RdfBatchProcessor now calls a new RdfRepository.clearOutgoingEntityRelationships for every entity in the batch, not just those with current edges. An entity whose last outgoing relationship was removed in MySQL contributes zero RelationshipData entries, so bulkStoreRelationships' per-source DELETE never fired for it (Copilot #4). - bulkStoreRelationships no longer swallows non-connect DELETE errors — DELETE WHERE on a source with no edges is a no-op, so exceptions there are real failures (malformed SPARQL, auth, server errors) and should surface (Copilot #5). Visibility: - reloadOntologies() now checks areOntologiesLoaded() after load and throws if still empty. OntologyLoader.loadOntologies catches internally, so the old reloadOntologies always appeared to succeed (Copilot #6). - clearAllGlossaryTermRelations rethrows on failure instead of silently logging — the indexer's caller can now react to cleanup failures (Copilot #10). - clearAllGlossaryTermRelations pulls custom predicate URIs from GlossaryTermRelationSettings and includes them in the DELETE FILTER. The hardcoded list missed any custom predicates an admin configured (Copilot #7). Quality: - Set / LinkedHashSet imported instead of using java.util.* fully qualified in JenaFusekiStorage and RdfBatchProcessor (gitar-bot #2). - RdfIndexAppTest uses InOrder to assert clearAll → reloadOntologies ordering — a plain verify would have accepted a future change that reordered the calls (Copilot #9). - Documented the residual gap that HttpClient.connectTimeout only bounds TCP connect, not request bodies; circuit breaker + bounded pendingWrites contain the blast radius (Copilot #8). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

harshach · 2026-05-14T16:47:02Z

Addressed all 13 review findings in 9b6e112. Summary:

Storage correctness:

storeEntity now preserves URI-valued triples and only refreshes literals, so metadata-only PATCHes don't wipe inter-entity edges and async ordering can't drop relationships (Copilot Create teams page similar to tags #1, Replace catalog to openmetadata #2).
removeRelationship is now scoped to the knowledge named graph and uses the correct predicate URI from getRelationshipPredicate (e.g. UPSTREAM → prov:wasDerivedFrom). The old bare DELETE in the default graph was a silent no-op (Copilot Apply apache 2.0 license to all the files in the project #3).
RdfBatchProcessor now calls a new RdfRepository.clearOutgoingEntityRelationships for every entity in the batch (not just those with current edges), so an entity whose last outgoing relationship was removed in MySQL actually loses its stale RDF edge (Copilot Create OpenMetadata website and required content #4).
bulkStoreRelationships no longer swallows non-connect DELETE errors (Copilot Users should not be able to delete service if the database exists #5).

Visibility:

reloadOntologies() now checks areOntologiesLoaded() after the load and throws on failure (Copilot Consolidate swagger UI and make it accessible in secure mode #6).
clearAllGlossaryTermRelations rethrows instead of silently logging (Copilot added git files #10).
clearAllGlossaryTermRelations pulls custom predicate URIs from GlossaryTermRelationSettings and includes them in the DELETE FILTER IN list (Copilot All Tiers should be displayed in a sort order #7).

Quality:

Set / LinkedHashSet imports added; no more java.util.* fully qualified names (gitar-bot Replace catalog to openmetadata #2).
RdfIndexAppTest uses InOrder to assert clearAll → reloadOntologies ordering (Copilot When authconfig returns empty string we still show the sign-in button #9).

Documented gaps:

HTTP request-body timeout (Copilot Show successful message for any post or put actions with subtle UX in the buttons #8): Jena's RDFConnection.update() doesn't expose a per-request timeout cleanly. Wrapping in CompletableFuture would change call semantics and we'd lose underlying HTTP cancellation either way. Added a comment near the CONNECT_TIMEOUT constant explaining the gap; the circuit breaker (5 failures → 30s short-circuit) plus the bounded pendingWrites gate in RdfUpdater (drop-on-overflow at 1000) bound the blast radius. Worth a follow-up to plumb QueryExecutionHTTPBuilder.timeout() / UpdateExecHTTPBuilder.timeout() through the connection layer.

Treated as false positive:

gitar-bot Create teams page similar to tags #1 (hardcoded predicate URIs in DELETE filter): addLineageWithDetails at RdfRepository.java:346,358 writes the predicate URIs as hardcoded https://open-metadata.org/ontology/UPSTREAM and https://open-metadata.org/ontology/hasLineageDetails (not baseUri-derived). The DELETE filter's hardcoded exclusion correctly matches what's actually stored. Switching the exclusion to use baseUri (as suggested) would BREAK the protection for non-default baseUri configurations because then the filter wouldn't match the still-hardcoded URIs from addLineageWithDetails. The deeper inconsistency — that addLineageWithDetails hardcodes while bulkStoreRelationships' INSERT uses baseUri + "ontology/" — is pre-existing and out of scope for this PR.

gitar-bot · 2026-05-14T16:50:10Z

+        org.openmetadata.schema.configuration.GlossaryTermRelationSettings settings =
+            org.openmetadata.service.resources.settings.SettingsCache.getSetting(
+                org.openmetadata.schema.settings.SettingsType.GLOSSARY_TERM_RELATION_SETTINGS,


💡 Quality: Fully qualified class names in clearAllGlossaryTermRelations

Lines 2546-2548 use fully qualified names (org.openmetadata.schema.configuration.GlossaryTermRelationSettings, org.openmetadata.service.resources.settings.SettingsCache, org.openmetadata.schema.settings.SettingsType) instead of imports. Per project conventions, wildcard and fully-qualified names should be avoided — add proper imports at the top of the file.

Replace fully qualified names with imports for readability and consistency.:

// Add to imports at top of file: import org.openmetadata.schema.configuration.GlossaryTermRelationSettings; import org.openmetadata.schema.settings.SettingsType; import org.openmetadata.service.resources.settings.SettingsCache; // Then replace FQNs in method body with simple names: GlossaryTermRelationSettings settings = SettingsCache.getSetting( SettingsType.GLOSSARY_TERM_RELATION_SETTINGS, GlossaryTermRelationSettings.class);

Apply fix

_{Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎}

gitar-bot · 2026-05-14T16:50:12Z

+  private static String expandPredicateCurie(String uri) {
+    if (uri == null || uri.isEmpty()) {
+      return "https://open-metadata.org/ontology/relatedTo";
+    }


💡 Edge Case: expandPredicateCurie silently defaults null/empty to relatedTo

expandPredicateCurie (line 2709-2710) returns "https://open-metadata.org/ontology/relatedTo" for null or empty input. In clearAllGlossaryTermRelations, this means a misconfigured relation type with a null rdfPredicate would cause the cleanup to target relatedTo triples even if the configured type never wrote them — or worse, skip cleaning the actual custom predicate. Since the null case is already guarded by the if (rdfPredicate != null) check at line 2553, this default is unreachable in current code but could silently mask bugs if called from elsewhere.

_{Was this helpful? React with 👍 / 👎}

gitar-bot · 2026-05-14T16:50:19Z

Code Review ⚠️ Changes requested 1 resolved / 4 findings

Implements weekly Fuseki state reconciliation and circuit breaking to isolate API latency, but requires fixes for hardcoded predicate URIs, missing lineage exclusions in clearOutgoingEntityRelationships, and silent defaults in expandPredicateCurie.

⚠️

Bug: Hardcoded predicate URIs in DELETE filter ignore configurable baseUri

📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/storage/JenaFusekiStorage.java:493-495 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java:350-351 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/storage/JenaFusekiStorage.java:513-515

In bulkStoreRelationships, the SPARQL DELETE WHERE filter hardcodes https://open-metadata.org/ontology/UPSTREAM and https://open-metadata.org/ontology/hasLineageDetails to exclude lineage predicates from deletion. However, the INSERT uses the configurable baseUri field for the ontology prefix (PREFIX om: <baseUri + "ontology/">). If baseUri is configured to anything other than https://open-metadata.org/, the predicates stored in the graph won't match the hardcoded exclusion URIs, causing lineage edges to be incorrectly deleted on every reconciliation run.

The storeRelationship method (single-relationship path) also hardcodes the same ontology URI pattern, so this is consistent within the file — but both paths are broken for non-default baseUri configurations.

Use baseUri for ontology predicates in the DELETE filter to match the INSERT path, keeping only the W3C prov URI hardcoded (since it's a well-known external vocabulary).

deleteUpdate
    .append("DELETE { GRAPH <")
    .append(KNOWLEDGE_GRAPH)
    .append("> { <")
    .append(sourceUri)
    .append("> ?p ?o } } WHERE { GRAPH <")
    .append(KNOWLEDGE_GRAPH)
    .append("> { <")
    .append(sourceUri)
    .append("> ?p ?o . FILTER(isIRI(?o) && STRSTARTS(STR(?o), "")
    .append(baseUri)
    .append("entity/") && ?p != <")
    .append(baseUri)
    .append("ontology/UPSTREAM> && ?p")
    .append(" != <http://www.w3.org/ns/prov#wasDerivedFrom> && ?p != <")
    .append(baseUri)
    .append("ontology/hasLineageDetails>) } }");

💡 Quality: Fully qualified class names in clearAllGlossaryTermRelations

📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java:2546-2548 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java:2552

Lines 2546-2548 use fully qualified names (org.openmetadata.schema.configuration.GlossaryTermRelationSettings, org.openmetadata.service.resources.settings.SettingsCache, org.openmetadata.schema.settings.SettingsType) instead of imports. Per project conventions, wildcard and fully-qualified names should be avoided — add proper imports at the top of the file.

Replace fully qualified names with imports for readability and consistency.

// Add to imports at top of file:
import org.openmetadata.schema.configuration.GlossaryTermRelationSettings;
import org.openmetadata.schema.settings.SettingsType;
import org.openmetadata.service.resources.settings.SettingsCache;

// Then replace FQNs in method body with simple names:
      GlossaryTermRelationSettings settings =
          SettingsCache.getSetting(
              SettingsType.GLOSSARY_TERM_RELATION_SETTINGS,
              GlossaryTermRelationSettings.class);

💡 Edge Case: expandPredicateCurie silently defaults null/empty to relatedTo

📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java:2708-2711

expandPredicateCurie (line 2709-2710) returns "https://open-metadata.org/ontology/relatedTo" for null or empty input. In clearAllGlossaryTermRelations, this means a misconfigured relation type with a null rdfPredicate would cause the cleanup to target relatedTo triples even if the configured type never wrote them — or worse, skip cleaning the actual custom predicate. Since the null case is already guarded by the if (rdfPredicate != null) check at line 2553, this default is unreachable in current code but could silently mask bugs if called from elsewhere.

✅ 1 resolved

✅ Quality: Fully qualified class names used instead of imports in new code

📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/storage/JenaFusekiStorage.java:470
New code in bulkStoreRelationships uses java.util.Set and java.util.LinkedHashSet as fully qualified names (line 470) instead of adding imports. Per the project's coding standards, fully qualified names should be avoided in favor of explicit imports.

🤖 Prompt for agents

Code Review: Implements weekly Fuseki state reconciliation and circuit breaking to isolate API latency, but requires fixes for hardcoded predicate URIs, missing lineage exclusions in `clearOutgoingEntityRelationships`, and silent defaults in `expandPredicateCurie`.

1. ⚠️ Bug: Hardcoded predicate URIs in DELETE filter ignore configurable baseUri
   Files: openmetadata-service/src/main/java/org/openmetadata/service/rdf/storage/JenaFusekiStorage.java:493-495, openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java:350-351, openmetadata-service/src/main/java/org/openmetadata/service/rdf/storage/JenaFusekiStorage.java:513-515

   In `bulkStoreRelationships`, the SPARQL DELETE WHERE filter hardcodes `https://open-metadata.org/ontology/UPSTREAM` and `https://open-metadata.org/ontology/hasLineageDetails` to exclude lineage predicates from deletion. However, the INSERT uses the configurable `baseUri` field for the ontology prefix (`PREFIX om: <baseUri + "ontology/">`). If `baseUri` is configured to anything other than `https://open-metadata.org/`, the predicates stored in the graph won't match the hardcoded exclusion URIs, causing lineage edges to be incorrectly deleted on every reconciliation run.
   
   The `storeRelationship` method (single-relationship path) also hardcodes the same ontology URI pattern, so this is consistent within the file — but both paths are broken for non-default `baseUri` configurations.

   Fix (Use baseUri for ontology predicates in the DELETE filter to match the INSERT path, keeping only the W3C prov URI hardcoded (since it's a well-known external vocabulary).):
   deleteUpdate
       .append("DELETE { GRAPH <")
       .append(KNOWLEDGE_GRAPH)
       .append("> { <")
       .append(sourceUri)
       .append("> ?p ?o } } WHERE { GRAPH <")
       .append(KNOWLEDGE_GRAPH)
       .append("> { <")
       .append(sourceUri)
       .append("> ?p ?o . FILTER(isIRI(?o) && STRSTARTS(STR(?o), "")
       .append(baseUri)
       .append("entity/") && ?p != <")
       .append(baseUri)
       .append("ontology/UPSTREAM> && ?p")
       .append(" != <http://www.w3.org/ns/prov#wasDerivedFrom> && ?p != <")
       .append(baseUri)
       .append("ontology/hasLineageDetails>) } }");

2. 💡 Quality: Fully qualified class names in clearAllGlossaryTermRelations
   Files: openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java:2546-2548, openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java:2552

   Lines 2546-2548 use fully qualified names (`org.openmetadata.schema.configuration.GlossaryTermRelationSettings`, `org.openmetadata.service.resources.settings.SettingsCache`, `org.openmetadata.schema.settings.SettingsType`) instead of imports. Per project conventions, wildcard and fully-qualified names should be avoided — add proper imports at the top of the file.

   Fix (Replace fully qualified names with imports for readability and consistency.):
   // Add to imports at top of file:
   import org.openmetadata.schema.configuration.GlossaryTermRelationSettings;
   import org.openmetadata.schema.settings.SettingsType;
   import org.openmetadata.service.resources.settings.SettingsCache;
   
   // Then replace FQNs in method body with simple names:
         GlossaryTermRelationSettings settings =
             SettingsCache.getSetting(
                 SettingsType.GLOSSARY_TERM_RELATION_SETTINGS,
                 GlossaryTermRelationSettings.class);

3. 💡 Edge Case: expandPredicateCurie silently defaults null/empty to relatedTo
   Files: openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java:2708-2711

   `expandPredicateCurie` (line 2709-2710) returns `"https://open-metadata.org/ontology/relatedTo"` for null or empty input. In `clearAllGlossaryTermRelations`, this means a misconfigured relation type with a null `rdfPredicate` would cause the cleanup to target `relatedTo` triples even if the configured type never wrote them — or worse, skip cleaning the actual custom predicate. Since the null case is already guarded by the `if (rdfPredicate != null)` check at line 2553, this default is unreachable in current code but could silently mask bugs if called from elsewhere.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-05-14T17:59:15Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-05-14T19:04:23Z

🟡 Playwright Results — all passed (15 flaky)

✅ 4055 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 103 skipped

Shard	Passed	Flaky	Skipped
🟡 Shard 1	297	2	4
🟡 Shard 2	743	7	25
🟡 Shard 3	781	3	7
✅ Shard 4	790	0	18
🟡 Shard 5	708	1	41
🟡 Shard 6	736	2	8

🟡 15 flaky test(s) (passed on retry)

Features/Pagination.spec.ts › should test pagination on Notification Alerts page (shard 1, 1 retry)
Features/TagsSuggestion.spec.ts › should edit and accept suggested tags for a table column (shard 1, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 2 retries)
Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 2 retries)
Features/KnowledgeCenterList.spec.ts › Knowledge Center List - Test infinite scroll/pagination (shard 2, 1 retry)
Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
Features/Permissions/EntityPermissions.spec.ts › Database deny entity-specific permission operations (shard 3, 1 retry)
Features/Permissions/GlossaryPermissions.spec.ts › Create only permission (shard 3, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Pages/Entity.spec.ts › Announcement create, edit & delete (shard 5, 1 retry)
Pages/GlossaryImportExport.spec.ts › Glossary CSV import preserves typed relations (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot AI review requested due to automatic review settings May 14, 2026 14:31

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 14, 2026

Copilot started reviewing on behalf of harshach May 14, 2026 14:32 View session

gitar-bot Bot reviewed May 14, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/rdf/storage/JenaFusekiStorage.java Outdated

Copilot AI reviewed May 14, 2026

View reviewed changes

harshach temporarily deployed to test May 14, 2026 14:41 — with GitHub Actions Inactive

harshach temporarily deployed to test May 14, 2026 14:43 — with GitHub Actions Inactive

harshach had a problem deploying to test May 14, 2026 14:43 — with GitHub Actions Error

harshach temporarily deployed to test May 14, 2026 14:43 — with GitHub Actions Inactive

harshach had a problem deploying to test May 14, 2026 14:43 — with GitHub Actions Error

harshach temporarily deployed to test May 14, 2026 14:43 — with GitHub Actions Inactive

gitar-bot Bot reviewed May 14, 2026

View reviewed changes

harshach temporarily deployed to test May 14, 2026 16:57 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency#28117

fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency#28117
harshach wants to merge 2 commits into
mainfrom
harshach/rdf-fuseki-duplicate-relations

harshach commented May 14, 2026

Uh oh!

gitar-bot Bot May 14, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

harshach commented May 14, 2026

Uh oh!

gitar-bot Bot May 14, 2026

Uh oh!

gitar-bot Bot May 14, 2026

Uh oh!

gitar-bot Bot commented May 14, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -371,32 +458,46 @@ public void bulkStoreRelationships(List<RelationshipData> relationships) {
		if (relationships.isEmpty()) {
		return;

Conversation

harshach commented May 14, 2026

Describe your changes:

Type of change:

High-level design:

Tests:

Unit tests

Backend integration tests

Manual testing performed

UI screen recording / screenshots:

Checklist:

Uh oh!

gitar-bot Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

harshach commented May 14, 2026

Uh oh!

gitar-bot Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 14, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

github-actions Bot commented May 14, 2026

🟡 Playwright Results — all passed (15 flaky)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gitar-bot Bot commented May 14, 2026 •

edited

Loading