Skip to content

Multi lang support for content#25921

Open
harshach wants to merge 3 commits intomainfrom
multi-lang-updated
Open

Multi lang support for content#25921
harshach wants to merge 3 commits intomainfrom
multi-lang-updated

Conversation

@harshach
Copy link
Collaborator

@harshach harshach commented Feb 16, 2026

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

This PR introduces comprehensive multi-language (i18n) support for entity content in OpenMetadata, enabling locale-specific storage and retrieval of displayNames and descriptions across the platform.

Key Technical Changes:

  • Multi-Language Infrastructure: Added translations field to 62+ entity schema JSON files (Table, Topic, Container, SearchIndex, DashboardDataModel, and more) to store locale-specific translations for displayName and description fields with full recursive support for nested columns/fields.

  • Translation Processing Utilities: Created TranslationUtil for locale-aware entity retrieval with intelligent fallback logic (e.g., "fr-FR" → "fr" → default "en"), and TranslationPatchUtil for handling PATCH operations on translations across entity hierarchies with validation and merge semantics.

  • Locale-Aware Repository Layer: Enhanced EntityRepository and entity-specific repositories (TableRepository, TopicRepository, ContainerRepository, etc.) to support locale parameter in get/getList operations with automatic translation application before response serialization.

  • REST Endpoint Enhancements: Added @QueryParam("locale") to all REST endpoints across data entity resources (Table, Topic, Container, SearchIndex, DashboardDataModel) for both GET and PATCH operations, enabling client-side language preference selection.

  • Comprehensive Test Coverage: Added translation tests covering multiple entity types, column/field translations, locale fallback mechanisms, PATCH operations with locale handling, and proper serialization of translations.

@github-actions
Copy link
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions
Copy link
Contributor

TypeScript types have been updated based on the JSON schema changes in the PR

@gitar-bot
Copy link

gitar-bot bot commented Feb 16, 2026

🔍 CI failure analysis for bbcd40c: Maven Sonarcloud CI failed with 64 test failures, primarily in translation tests - test expects English but gets Spanish translation, indicating a bug in the translation retrieval logic or test implementation.

Issue

Maven Sonarcloud CI build failed with 64 test failures:

  • Job: maven-sonarcloud-ci (job id: 63787676920)
  • Test result: 8003 tests run, 701 skipped, 64 failed
  • All failures are directly related to this PR's translation feature

Root Cause

This failure IS related to the PR changes - there is a bug in how translations are retrieved or tested.

Primary Failure Pattern (60+ tests)

Error at line 1338 in EntityResourceTest.test_translations:

expected: <English Display Name> but was: <Nombre de Visualización en Español>

Affected test classes (60+ entity types):

  • AIApplicationResourceTest
  • APICollectionResourceTest
  • ChartResourceTest
  • DashboardResourceTest
  • DatabaseResourceTest
  • TableResourceTest
  • TopicResourceTest
  • ContainerResourceTest
  • SearchIndexResourceTest
  • DashboardDataModelResourceTest
  • And 50+ more entity resource tests

What's Happening

Test sequence (from logs):

  1. Test creates an entity with English displayName
  2. Test patches in Spanish translation:
    {"op":"add","path":"/translations/translations/-",
     "value":{"locale":"es","displayName":"Nombre de Visualización en Español",
              "description":"Descripción en Español"}}
  3. Test retrieves the entity (presumably without locale parameter or with locale=en)
  4. Expected: Get English displayName (original field value)
  5. Actual: Get Spanish displayName from translations

The bug: When retrieving an entity, the translation is being applied even when:

  • No locale parameter is specified, OR
  • The locale parameter is "en" (English), OR
  • The original field values should be returned instead of translated values

Analysis

This is a critical bug in the translation feature logic:

The translation retrieval logic in TranslationUtil.java is incorrectly applying translations when it shouldn't. Looking at the code review findings, this relates to:

  1. Issue with fallback behavior: When no locale is specified or locale="en", the system should return the original displayName/description fields, not look in translations.

  2. Possible causes:

    • TranslationUtil.applyTranslation() is being called unconditionally
    • Default locale handling is incorrect (treating missing locale as "es" instead of "en")
    • The translation application logic doesn't check if the requested locale matches the original content locale

Test expectation (line 1338):
The test verifies that after adding a Spanish translation, retrieving the entity without a locale parameter (or with locale=en) still returns the original English values, not the Spanish translation.

Secondary Failures (DataProduct tests)

5 additional failures in DataProductResourceTest:

  • testDataProductBulkOutputPorts:850 - expected success but was failure
  • testDataProductDomainMigrationWithInputOutputPorts:1732 - wrong domain UUID
  • testGetOutputPortsReturnsFullEntities:1012 - expected 1 port but was 0
  • testGetPortsByNameEndpoints:1108 - expected 1 port but was 0
  • testGetPortsViewEndpoint:1066 - expected 1 port but was 0

These suggest that the translation changes may have broken DataProduct port retrieval logic, possibly related to how entities are serialized/deserialized with the new translations field.

Details

Failure distribution:

  • 60 failures: test_translations method across all entity types
  • 1 failure: test_translations in DataProductResourceTest
  • 3 failures: DataProduct output port counting/retrieval
  • 1 failure: DataProduct domain migration

All failures trace back to the translation feature implementation.

Code Review 🚫 Blocked 0 resolved / 7 findings

All 7 previous findings remain unresolved: authorization bypass on translation PATCH by ID, FQN locale patch not using patchWithTranslations, duplicate translation entries on every PATCH, EntityUpdater bypass losing change tracking, pervasive LOG.info debug statements on hot paths, empty string fallback blanking displayName/description, and unused opType variable.

🚨 Security: Translation PATCH by ID bypasses authorization entirely

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java:649

When patchInternal(uriInfo, securityContext, id, patch, locale, changeSource) detects a translation patch (locale is non-null and not "en"), it directly calls repository.patchWithTranslations() at line 663 and returns, completely skipping the authorization check.

Compare with the regular patchInternal at line 687, which calls authorizer.authorize(securityContext, operationContext, getResourceContextById(id, ResourceContextInterface.Operation.PATCH)) before executing the patch.

This means any authenticated user can modify any entity's translations regardless of their permissions, which is a significant access control bypass. The authorization check must be added before calling patchWithTranslations.

Suggested fix
    // Process the patch based on locale
    JsonPatch processedPatch = patch;
    boolean isTranslationPatch = locale != null && !locale.isEmpty() && !"en".equals(locale);
    if (isTranslationPatch) {
      // Authorize the patch operation
      OperationContext operationContext = new OperationContext(entityType, patch);
      authorizer.authorize(
          securityContext,
          operationContext,
          getResourceContextById(id, ResourceContextInterface.Operation.PATCH));
      // Transform patch operations on displayName/description to translation updates
      processedPatch = TranslationPatchUtil.handleTranslationPatch(patch, locale, false);
⚠️ Bug: PATCH by FQN with locale doesn't use patchWithTranslations

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java:732

The FQN-based patchInternal with locale (line 732-746) transforms the patch via TranslationPatchUtil.handleTranslationPatch() but then passes it to the regular patchInternal(uriInfo, securityContext, fqn, processedPatch, changeSource).

The regular patch path calls repository.patch() which fetches the entity through the standard flow. Unlike patchWithTranslations(), the standard path does NOT initialize the translations field on the entity before applying the patch. The transformed patch contains operations targeting /translations/translations/- (append to translations array), but this path won't exist on the entity if translations have never been set, causing the JSON Patch operation to fail.

This is inconsistent with the by-ID variant at line 649 which correctly calls repository.patchWithTranslations(). The by-FQN variant must follow the same pattern to work correctly.

⚠️ Bug: Translation PATCH always appends, creating duplicate entries

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/TranslationPatchUtil.java:211

convertToTranslationPatch() at line 213-217 always generates an "add" operation with path /translations/translations/- (append to end of array). It never checks whether a translation for the given locale already exists.

If a user PATCHes displayName for locale "es" twice, the entity will end up with two Translation entries for "es" in the translations array. Over time, repeated updates will accumulate unbounded duplicates for the same locale, causing:

  1. Data corruption — findTranslation() uses .stream().findFirst() so it will always return the oldest (possibly stale) translation
  2. Unbounded storage growth in the entity JSON

The fix should either: (a) find and replace the existing translation for the same locale instead of appending, or (b) add a deduplication step in patchWithTranslations() after applying the patch to merge/replace translations with the same locale.

⚠️ Bug: patchWithTranslations bypasses EntityUpdater and change tracking

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:2242

patchWithTranslations() bypasses the normal EntityUpdater flow by directly calling:

  • updated.setVersion(EntityUtil.nextVersion(original.getVersion())) — always bumps the minor version without considering whether it's a major change
  • storeEntity(updated, true) — direct store without the EntityUpdater's change description generation, field-level diff tracking, or optimistic concurrency control (no If-Match/version check)
  • postUpdate(original, updated) — creates change events but with no ChangeDescription (the entity's change description is never populated)

This means:

  1. No optimistic locking — concurrent translation patches can silently overwrite each other (last-write-wins)
  2. No change description — the entity version history won't have details about what changed in translation patches, breaking audit trail
  3. Version always increments — even if the patch makes no effective change, the version bumps, polluting version history

Consider routing translation patches through the standard patch() method with the translations field properly initialized, or at minimum adding version conflict detection.

⚠️ Performance: Debug LOG.info on hot paths degrades performance for all users

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityDAO.java:598 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:1151 📄 openmetadata-service/src/main/java/org/openmetadata/service/util/TranslationUtil.java:31

There are approximately 30+ LOG.info() statements with "DEBUG" prefixes scattered across EntityDAO.jsonToEntity(), EntityRepository.setFieldsInBulk(), EntityRepository.listAfter(), EntityRepository.get(), EntityRepository.getByName(), TranslationUtil.applyTranslations(), and related methods. These are on the critical path for every entity read operation.

Key concerns:

  1. EntityDAO.jsonToEntity() (line 599) — called for EVERY entity deserialization, does json.contains("\"translations\"") string scan on every entity load
  2. EntityRepository.setFieldsInBulk() (lines 1151-1174) — streams translation locales for every entity in every list operation
  3. TranslationUtil has 15 LOG.info() calls that fire on every entity retrieval when locale is specified, including per-column/per-field logging

These are clearly development debugging statements that should be removed or downgraded to LOG.debug() before merging. At INFO level, they will generate enormous log volume in production even when no translations are in use, as they fire on every entity read path.

⚠️ Edge Case: Missing translation blanks displayName/description instead of using original

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/TranslationUtil.java:83

When a locale is requested but no translation exists for it, applyTranslations() at lines 88-91 sets the entity's displayName and description to empty strings:

entity.setDisplayName("");
entity.setDescription("");

The same behavior occurs for child columns/fields (lines 236-237, 275-276, 315-316).

This is a destructive behavior that causes data loss in the response. If a user requests locale "ja" for an entity that only has "es" and "fr" translations, they get empty strings instead of the original English content. Most i18n systems use fallback to the default language when a translation is missing, rather than blanking out the content.

The PR description says this is "to signal translation can be provided," but this creates a terrible user experience — any UI component displaying this entity with an unsupported locale will show blank content, and API consumers must handle the empty-string case specially to fall back to re-fetching with locale=en.

💡 Quality: Unused variable opType in convertToTranslationPatch

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/TranslationPatchUtil.java:67

In convertToTranslationPatch(), line 67 declares String opType = op.getString("op") but it's never used. The operation type is not checked when processing displayName/description patches, meaning "remove" operations on /displayName or /description are silently treated as "add" operations (the value is extracted and put into the translation object).

A "remove" op doesn't have a "value" field, so it will be silently skipped by the op.containsKey("value") check, but a "replace" op will be treated the same as "add" — the opType should at minimum be used to distinguish between add/replace and remove operations.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

return patchInternal(uriInfo, securityContext, id, patch, null);
}

public Response patchInternal(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Security: Translation PATCH by ID bypasses authorization entirely

When patchInternal(uriInfo, securityContext, id, patch, locale, changeSource) detects a translation patch (locale is non-null and not "en"), it directly calls repository.patchWithTranslations() at line 663 and returns, completely skipping the authorization check.

Compare with the regular patchInternal at line 687, which calls authorizer.authorize(securityContext, operationContext, getResourceContextById(id, ResourceContextInterface.Operation.PATCH)) before executing the patch.

This means any authenticated user can modify any entity's translations regardless of their permissions, which is a significant access control bypass. The authorization check must be added before calling patchWithTranslations.

Suggested fix:

    // Process the patch based on locale
    JsonPatch processedPatch = patch;
    boolean isTranslationPatch = locale != null && !locale.isEmpty() && !"en".equals(locale);
    if (isTranslationPatch) {
      // Authorize the patch operation
      OperationContext operationContext = new OperationContext(entityType, patch);
      authorizer.authorize(
          securityContext,
          operationContext,
          getResourceContextById(id, ResourceContextInterface.Operation.PATCH));
      // Transform patch operations on displayName/description to translation updates
      processedPatch = TranslationPatchUtil.handleTranslationPatch(patch, locale, false);

Was this helpful? React with 👍 / 👎

return patchInternal(uriInfo, securityContext, fqn, patch, null);
}

public Response patchInternal(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: PATCH by FQN with locale doesn't use patchWithTranslations

The FQN-based patchInternal with locale (line 732-746) transforms the patch via TranslationPatchUtil.handleTranslationPatch() but then passes it to the regular patchInternal(uriInfo, securityContext, fqn, processedPatch, changeSource).

The regular patch path calls repository.patch() which fetches the entity through the standard flow. Unlike patchWithTranslations(), the standard path does NOT initialize the translations field on the entity before applying the patch. The transformed patch contains operations targeting /translations/translations/- (append to translations array), but this path won't exist on the entity if translations have never been set, causing the JSON Patch operation to fail.

This is inconsistent with the by-ID variant at line 649 which correctly calls repository.patchWithTranslations(). The by-FQN variant must follow the same pattern to work correctly.

Was this helpful? React with 👍 / 👎

newOperations.add(convertToJsonValue(addFieldTranslationOp));
}

// Add or update the translation for this locale
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Translation PATCH always appends, creating duplicate entries

convertToTranslationPatch() at line 213-217 always generates an "add" operation with path /translations/translations/- (append to end of array). It never checks whether a translation for the given locale already exists.

If a user PATCHes displayName for locale "es" twice, the entity will end up with two Translation entries for "es" in the translations array. Over time, repeated updates will accumulate unbounded duplicates for the same locale, causing:

  1. Data corruption — findTranslation() uses .stream().findFirst() so it will always return the oldest (possibly stale) translation
  2. Unbounded storage growth in the entity JSON

The fix should either: (a) find and replace the existing translation for the same locale instead of appending, or (b) add a deduplication step in patchWithTranslations() after applying the patch to merge/replace translations with the same locale.

Was this helpful? React with 👍 / 👎

return patch(uriInfo, id, user, patch, changeSource, null, null);
}

@Transaction
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: patchWithTranslations bypasses EntityUpdater and change tracking

patchWithTranslations() bypasses the normal EntityUpdater flow by directly calling:

  • updated.setVersion(EntityUtil.nextVersion(original.getVersion())) — always bumps the minor version without considering whether it's a major change
  • storeEntity(updated, true) — direct store without the EntityUpdater's change description generation, field-level diff tracking, or optimistic concurrency control (no If-Match/version check)
  • postUpdate(original, updated) — creates change events but with no ChangeDescription (the entity's change description is never populated)

This means:

  1. No optimistic locking — concurrent translation patches can silently overwrite each other (last-write-wins)
  2. No change description — the entity version history won't have details about what changed in translation patches, breaking audit trail
  3. Version always increments — even if the patch makes no effective change, the version bumps, polluting version history

Consider routing translation patches through the standard patch() method with the translations field properly initialized, or at minimum adding version conflict detection.

Was this helpful? React with 👍 / 👎


default T jsonToEntity(String json, Object identity) {
Class<T> clz = getEntityClass();
// Debug logging to check if translations field is in JSON
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance: Debug LOG.info on hot paths degrades performance for all users

There are approximately 30+ LOG.info() statements with "DEBUG" prefixes scattered across EntityDAO.jsonToEntity(), EntityRepository.setFieldsInBulk(), EntityRepository.listAfter(), EntityRepository.get(), EntityRepository.getByName(), TranslationUtil.applyTranslations(), and related methods. These are on the critical path for every entity read operation.

Key concerns:

  1. EntityDAO.jsonToEntity() (line 599) — called for EVERY entity deserialization, does json.contains("\"translations\"") string scan on every entity load
  2. EntityRepository.setFieldsInBulk() (lines 1151-1174) — streams translation locales for every entity in every list operation
  3. TranslationUtil has 15 LOG.info() calls that fire on every entity retrieval when locale is specified, including per-column/per-field logging

These are clearly development debugging statements that should be removed or downgraded to LOG.debug() before merging. At INFO level, they will generate enormous log volume in production even when no translations are in use, as they fire on every entity read path.

Was this helpful? React with 👍 / 👎

if (translation.getDescription() != null) {
entity.setDescription(translation.getDescription());
}
} else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Edge Case: Missing translation blanks displayName/description instead of using original

When a locale is requested but no translation exists for it, applyTranslations() at lines 88-91 sets the entity's displayName and description to empty strings:

entity.setDisplayName("");
entity.setDescription("");

The same behavior occurs for child columns/fields (lines 236-237, 275-276, 315-316).

This is a destructive behavior that causes data loss in the response. If a user requests locale "ja" for an entity that only has "es" and "fr" translations, they get empty strings instead of the original English content. Most i18n systems use fallback to the default language when a translation is missing, rather than blanking out the content.

The PR description says this is "to signal translation can be provided," but this creates a terrible user experience — any UI component displaying this entity with an unsupported locale will show blank content, and API consumers must handle the empty-string case specially to fall back to re-fetching with locale=en.

Was this helpful? React with 👍 / 👎

@github-actions
Copy link
Contributor

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 65%
65.69% (56271/85657) 45.12% (29422/65215) 47.94% (8891/18548)

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant