Skip to content

feat(ingestion): Add Pinecone Vector DB Source with skill_docs #datahub-skills#16472

Open
jishanahmed-shaikh wants to merge 42 commits intodatahub-project:masterfrom
jishanahmed-shaikh:master
Open

feat(ingestion): Add Pinecone Vector DB Source with skill_docs #datahub-skills#16472
jishanahmed-shaikh wants to merge 42 commits intodatahub-project:masterfrom
jishanahmed-shaikh:master

Conversation

@jishanahmed-shaikh
Copy link
Copy Markdown

This PR introduces a production-grade Pinecone ingestion source, developed and verified using the datahub-skills framework with a 9.9/10 review score.

🚀 Key Features:

  • Schema Inference: Implements a three-tier sampling strategy to detect metadata types and generate virtual schemas for vector collections.
  • Production Hardening: Includes @with_retry for exponential backoff on rate limits and @lru_cache for connection reuse across ingestion runs.
  • Verified Architecture: Maps Pinecone Index $\rightarrow$ Namespace $\rightarrow$ Dataset hierarchy to DataHub containers with proper URN generation.
  • Skill Documentation: Includes all required skill_docs/ artifacts (Planning, Implementation, and Final Review) in the connector directory.

#datahub-skills

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Mar 7, 2026
@maggiehays maggiehays added the needs-review Label for PRs that need review from a maintainer. label Mar 7, 2026
- Add capability decorators for PLATFORM_INSTANCE, DOMAINS, CONTAINERS
- Remove unused imports (ContainerClass, ContainerPropertiesClass, Dict)
- Fix F541: Remove unnecessary f-string prefix in pinecone_client.py
- Fix SIM114: Combine isinstance checks in schema_inference.py
- Remove unused variable assignments in test_pinecone_source.py
- Format markdown documentation with prettier
- Register Pinecone in connector registry with all 5 capabilities

Fixes PR datahub-project#16472
…ation formatting

- Add missing 'Dict' import to pinecone_source.py (fixes NameError)
- Remove unused 'Dict' import from report.py
- Re-sort import blocks in test_pinecone_source.py
- Apply final Prettier formatting to skill_docs/*.md

All local ruff checks passed.
@sgomezvillamor
Copy link
Copy Markdown
Contributor

What about adding some integration tests with Pinecone emulator available as a Docker image? https://docs.pinecone.io/guides/operations/local-development

@jishanahmed-shaikh
Copy link
Copy Markdown
Author

What about adding some integration tests with Pinecone emulator available as a Docker image? https://docs.pinecone.io/guides/operations/local-development

Sounds Good, giving it a try

javabrett and others added 14 commits March 10, 2026 21:47
…afety (datahub-project#16444)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…nts (datahub-project#16452)

Co-authored-by: John Joyce <john@ip-192-168-1-212.us-west-2.compute.internal>
Co-authored-by: John Joyce <john@Mac-3236.lan>
Co-authored-by: John Joyce <john@ip-192-168-1-212.us-west-2.compute.internal>
Co-authored-by: John Joyce <john@Mac-3094.lan>
Co-authored-by: John Joyce <john@Mac-3608.lan>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Bundle Report

Changes will increase total bundle size by 27 bytes (0.0%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
datahub-react-web-esm 22.71MB 27 bytes (0.0%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/index-*.js 27 bytes 12.46MB 0.0%

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community depot ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.