Skip to content

Conversation

@lexfrei
Copy link

@lexfrei lexfrei commented Nov 25, 2025

Describe your changes:

Fixes #24546

Add support for specifying multiple Salesforce objects to ingest via the new sobjectNames array field. This addresses a common use case where users want to ingest metadata from specific objects (e.g., 20 objects out of 1000+) without having to either:

  • Ingest all objects and filter them with regex patterns
  • Configure multiple separate ingestion pipelines

The new field follows the same pattern as BigQuery's taxonomyProjectID array field.

Breaking Change: The existing sobjectName (string) field has been removed and replaced with sobjectNames (array). Migration scripts automatically convert existing configurations.

Priority order:

  1. sobjectNames (array) - if specified, use only these objects
  2. All objects from describe() - if not specified

tableFilterPattern is applied in all cases as a final filter.

Changes:

  • Removed sobjectName field and added sobjectNames array field to JSON Schema
  • Added database migration scripts for MySQL and PostgreSQL (version 1.11.8) to migrate existing configurations
  • Updated Python connector to handle the new field with proper priority logic
  • Added UI documentation for the new field
  • Added unit tests for configuration parsing and priority logic
  • Updated example workflow

Type of change:

  • New feature
  • Breaking change

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Note on migrations: Migration scripts have been added for both MySQL and PostgreSQL in version 1.11.8. These scripts automatically convert any existing sobjectName (string) values to sobjectNames (array) format, ensuring backward compatibility during upgrade.

  • The issue properly describes why the new feature is needed, what's the goal, and how we are building it. Any discussion or decision-making process is reflected in the issue.
  • I have updated the documentation.
  • I have added tests around the new logic.

lexfrei and others added 2 commits November 25, 2025 14:21
Add support for specifying multiple Salesforce objects to ingest
instead of just one or all. The new `sobjectNames` array field
allows users to select specific objects (e.g., Contact, Account,
Lead) without having to ingest all objects and filter them.

Priority order:
1. sobjectNames (array) - if specified, use only these
2. sobjectName (string) - if specified and sobjectNames empty
3. All objects from describe() - if neither specified

tableFilterPattern applies in all cases as a final filter.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Aleksei Sviridkin <[email protected]>
@github-actions
Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

1 similar comment
@github-actions
Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Contributor

⚠️Generated types need to be updated.
The generated TypeScript types cannot be automatically committed from a forked repository.
Please run the type generation locally and commit the changes manually.

To generate the types locally, run:

cd openmetadata-ui/src/main/resources/ui
./json2ts-generate-all.sh -l true

@keshavmohta09 keshavmohta09 added Ingestion safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch labels Jan 28, 2026
@github-actions
Copy link
Contributor

⚠️Generated types need to be updated.
The generated TypeScript types cannot be automatically committed from a forked repository.
Please run the type generation locally and commit the changes manually.

To generate the types locally, run:

cd openmetadata-ui/src/main/resources/ui
./json2ts-generate-all.sh -l true

@github-actions
Copy link
Contributor

⚠️Generated types need to be updated.
The generated TypeScript types cannot be automatically committed from a forked repository.
Please run the type generation locally and commit the changes manually.

To generate the types locally, run:

cd openmetadata-ui/src/main/resources/ui
./json2ts-generate-all.sh -l true

@gitar-bot
Copy link

gitar-bot bot commented Jan 30, 2026

🔍 CI failure analysis for e140dc8: Sixteen CI failures are unrelated to this PR's Salesforce connector changes. Issues: external Collate build, Maven build failures (missing schema files), Java integration tests (database schema, NullPointerException, flaky test), Playwright E2E tests (5 of 6 shards failing - widespread systemic E2E instability), Python integration tests, and cascading Test Report failures. All are pre-existing systemic issues.

Issues

Sixteen CI jobs failed:

  1. maven-collate-ci - External Collate build trigger
  2. maven-postgresql-ci - Maven build with tests
  3. maven-sonarcloud-ci - Maven build with tests
  4. integration-tests-postgres-opensearch - Java integration tests
  5. integration-tests-mysql-elasticsearch - Java integration tests
  6. Test Report (x4) - Dependent failures
  7. playwright-ci-postgresql (2, 3, 4, 5, 6 of 6) - E2E frontend tests (5 of 6 shards)
  8. py-run-tests (3.10 x2, 3.11) - Python integration tests

Root Cause

NONE are related to this PR's changes. This PR only modifies Salesforce connector files.

Failure 1: maven-collate-ci

  • Cause: Infrastructure/external dependency issue

Failures 2 & 3: maven-postgresql-ci and maven-sonarcloud-ci

  • Error: Schema file not found for multiple entity types
  • Solution: Missing schema files should be added or SchemaFieldExtractor should handle missing schemas gracefully

Failures 4 & 5: Java integration-tests

  • a) Database Schema Issue: varchar(255) too small for entityFQNHash
  • b) NullPointerException: Missing null checks for EntityReference.getType()
  • c) Flaky Test: AppsResourceIT race condition

Failures 7-11: playwright-ci-postgresql (5 of 6 shards failing)

  • Cause: E2E test flakiness (page/context closed, timeouts)
  • Note: Widespread failure across 5 shards indicates systemic E2E test stability issues

Failures 12: py-run-tests

  • Failed Tests: Trino classifier, PostgreSQL lineage

Failure 6: Test Report (x4)

  • Cause: Cascading failures

Summary

All sixteen failures are pre-existing systemic issues unrelated to this PR's Salesforce connector changes.

Code Review ✅ Approved 3 resolved / 3 findings

Well-implemented feature to support multi-object selection in Salesforce connector. Migration scripts correctly handle the schema change, Python logic is clear, and tests provide good coverage.

✅ 3 resolved
Bug: Test passes wrong argument type to SalesforceSource.create()

📄 ingestion/tests/unit/topology/database/test_salesforce.py:602
In test_sobject_names_config, the test passes config.workflowConfig.openMetadataServerConfig directly to SalesforceSource.create(), but other tests in the same file wrap it with OpenMetadata(config=...).

Looking at line 602-603 vs line 618-619:

# test_sobject_names_config (line 602-603):
salesforce_source = SalesforceSource.create(
    mock_salesforce_multi_objects_config["source"],
    config.workflowConfig.openMetadataServerConfig,  # <-- raw config
)

# test_ingestion_with_sobject_names_list (line 618-619):
salesforce_source = SalesforceSource.create(
    mock_salesforce_multi_objects_config["source"],
    OpenMetadata(config=config.workflowConfig.openMetadataServerConfig),  # <-- wrapped
)

This inconsistency suggests test_sobject_names_config may be passing the wrong argument type, which could cause the test to fail or behave unexpectedly depending on how the underlying API handles the type mismatch.

Suggested fix: Wrap the config with OpenMetadata(config=...) consistently:

salesforce_source = SalesforceSource.create(
    mock_salesforce_multi_objects_config["source"],
    OpenMetadata(config=config.workflowConfig.openMetadataServerConfig),
)
Quality: Priority comment skips number 2 (goes from 1 to 3)

📄 ingestion/src/metadata/ingestion/source/database/salesforce/metadata.py:165
The docstring comment in get_tables_name_and_type() has an inconsistent numbered list that skips from 1 to 3:

Priority:
1. sobjectNames (array) - if specified, iterate over these
3. All objects from describe()  # <-- Should be 2

This is likely a copy-paste error from removing a previous option.

Suggested fix: Update to use sequential numbering:

Priority:
1. sobjectNames (array) - if specified, iterate over these
2. All objects from describe()
Bug: MySQL migration missing newline at end of file

📄 bootstrap/sql/migrations/native/1.11.8/mysql/schemaChanges.sql:21
The MySQL migration file schemaChanges.sql is missing a newline at the end of the file (as indicated by \ No newline at end of file). While this might seem minor, it can cause issues with:

  1. Some SQL tools that require a newline terminator
  2. Git diff noise in future commits
  3. Concatenation issues if multiple migration files are joined

Similarly, the PostgreSQL migration file also lacks a trailing newline.

Suggested fix: Add a newline character at the end of both migration files.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Contributor

⚠️Generated types need to be updated.
The generated TypeScript types cannot be automatically committed from a forked repository.
Please run the type generation locally and commit the changes manually.

To generate the types locally, run:

cd openmetadata-ui/src/main/resources/ui
./json2ts-generate-all.sh -l true

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Salesforce connector: add support for selecting multiple specific objects

6 participants