-
Notifications
You must be signed in to change notification settings - Fork 32
feat: Bump langchain dependencies to 1.0.x #810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Update langchain from 0.1.16 to ^1.0.2 - Update langchain_core from 0.1.42 to ^1.0.0 - Add langchain_community ^0.4 as explicit dependency - Update imports to use new langchain package structure: - langchain_community.embeddings for embedding classes - langchain_text_splitters for text splitting - langchain_core.utils.strings for utility functions - Fix CohereEmbeddings initialization to include required user_agent parameter - All fast tests passing (3800 passed) Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1761176353-bump-langchain#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1761176353-bump-langchainHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR upgrades the langchain dependencies from 0.1.x to 1.0.x, migrating to the new package structure introduced in langchain's major version update. The changes update import paths across the codebase to align with langchain 1.0's modularized architecture.
- Bumped core langchain packages to 1.0.x versions and added langchain_community as explicit dependency
- Updated import statements to use new package structure (langchain_community, langchain_text_splitters, langchain_core.utils.strings)
- Added required user_agent parameter to CohereEmbeddings initialization
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pyproject.toml | Updated langchain dependency versions to 1.0.x and added langchain_community to extras |
| airbyte_cdk/destinations/vector_db_based/embedder.py | Migrated embedding imports to langchain_community and added user_agent to CohereEmbeddings |
| airbyte_cdk/destinations/vector_db_based/document_processor.py | Updated text splitter imports to langchain_text_splitters and utils to langchain_core |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-Authored-By: AJ Steers <[email protected]>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughUpdates move LangChain-related imports to new modular packages ( Changes
Sequence Diagram(s)(No sequence diagram generated — changes are import/dependency updates and a single constructor parameter addition without new control flow.) Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
Suggested reviewers
Quick question: Have you verified that the added Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
- Remove unused langchain dependency - Add langchain_text_splitters as explicit dependency - Update vector-db-based extras to include langchain_text_splitters - Fixes deptry dependency analysis errors Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
airbyte_cdk/destinations/vector_db_based/embedder.py (1)
145-147: User agent version info.As mentioned in a previous review, the hardcoded
user_agent='airbyte-cdk'could benefit from including version information for better tracking in Cohere's logs.pyproject.toml (1)
71-71: Outdated comment about openai dependency.As noted in a previous review, the comment about openai being used indirectly by langchain is outdated with langchain 1.0.x.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
airbyte_cdk/destinations/vector_db_based/document_processor.py(1 hunks)airbyte_cdk/destinations/vector_db_based/embedder.py(2 hunks)pyproject.toml(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Dependency Analysis
airbyte_cdk/destinations/vector_db_based/document_processor.py
[error] 13-13: DEP003 'langchain_text_splitters' imported but it is a transitive dependency
pyproject.toml
[error] 1-1: DEP002 'langchain' defined as a dependency but not used in the codebase
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
airbyte_cdk/destinations/vector_db_based/document_processor.py (1)
12-13: Import paths look good for langchain 1.0.x migration!The updated import paths for
stringify_dictand text splitters align well with the langchain 1.0.x package structure. Just note thatlangchain_text_splittersneeds to be added as a declared dependency inpyproject.toml(I've flagged this in my review of that file).airbyte_cdk/destinations/vector_db_based/embedder.py (1)
10-15: Now let me verify the scope of impact on the codebase:Migrate to dedicated embedding packages to resolve deprecation warnings
OpenAIEmbeddings from
langchain_communityis deprecated since version 0.0.9 with removal target 1.0, and CohereEmbeddings is deprecated since version 0.0.30. Since you're already doing a major version upgrade, would you consider migrating to the recommended packages now?
- Use
langchain_openai.OpenAIEmbeddingsinstead- Use
langchain_cohere.CohereEmbeddingsinsteadThis will eliminate deprecation warnings and keep the codebase ahead of LangChain 1.0. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pyproject.toml (1)
71-71: Consider updating the openai dependency comment for clarity with langchain 1.0.x.The comment states openai is "Used indirectly by langchain library", but with langchain 1.0.x's modular architecture, this relationship has changed. OpenAI embeddings now come from the dedicated
langchain-openaipackage (per deprecation warnings noted in the PR).Should we clarify the comment to reflect that this is an optional dependency for embedding support, and possibly note that
langchain-openaiis the recommended approach going forward? This would help future maintainers understand the dependency rationale, wdyt?-openai = { version = "0.27.9", extras = ["embeddings"], optional = true } # Used indirectly by langchain library +openai = { version = "0.27.9", extras = ["embeddings"], optional = true } # Optional: Used for embedding support. With langchain 1.0.x, langchain-openai package is the recommended approach for OpenAI embeddings.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (1)
pyproject.toml(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Check: source-shopify
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Pytest (Fast)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
🔇 Additional comments (1)
pyproject.toml (1)
67-69: Modular langchain 1.0.x architecture properly reflected.The migration to the modular ecosystem looks solid. All three required packages are declared with appropriate version constraints, and the
vector-db-basedextras have been updated correctly to include them. The removal of the langchain metapackage aligns with the 1.0.x architecture shift.Also applies to: 125-125
PyTest Results (Full)3 805 tests 3 793 ✅ 11m 42s ⏱️ Results for commit 5e19d46. |
|
Good question! Looking at the current code and deprecation warnings: Current state:
Recommendation:
Changes needed:
Should I make these changes now, or would you prefer to handle this in a follow-up PR? |
- Update CDK dependency to use local path (langchain upgrade branch) - Temporarily comment out PyAirbyte dependency (will restore after CDK PR merges) - Update Python version constraint to ^3.10,<3.13 - Run poetry lock to update dependencies This is Stage 3 of the langchain upgrade plan: - Stage 1: ✅ Bump pgvector to latest CDK (skipped, handled by PyAirbyte) - Stage 2: ✅ Bump langchain in CDK (PR #810) - Stage 3: 🔄 Update pgvector to use new CDK with updated langchain Related PR: airbytehq/airbyte-python-cdk#810 Co-Authored-By: AJ Steers <[email protected]>
|
Devin, please reply here in a comment. What are the latest versions of all libraries we are updating. Please link to their PyPI pages and/or to their GitHub releases pages. |
|
Here are the latest versions of all libraries we're updating: Core LangChain Packages:
Removed:
Version Constraints Used:
Breaking Changes:
All changes have been addressed in this PR with updated imports and parameters. |
|
@dbgold17 - Do you mind reviewing and approving if things look okay? Context:
Important: This also clears our path for the |
dbgold17
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, if we can go all the way to 1.0.x without issue let's do that instead of my PR to go to 0.3.0
|
@aaronsteers I looked into the shopify error and my best guess is that the SSL failure is related to this dummy shop name defined in an invalid config file. The test produces the same result for me locally running on master as well, however for some reason it still marks it as |
feat: Bump langchain dependencies to 1.0.x
Summary
Upgraded langchain dependencies from 0.1.x to 1.0.x as part of a multi-stage plan to modernize the CDK's vector database dependencies. This is Stage 2 of the planned work.
Changes:
langchainfrom0.1.16to^1.0.2langchain_corefrom0.1.42to^1.0.0langchain_community^0.4as an explicit dependency (was previously only transitive)embedder.pyanddocument_processor.pyto use new langchain 1.0 package structure:langchain.embeddings.*→langchain_community.embeddingslangchain.text_splitter→langchain_text_splitterslangchain.utils→langchain_core.utils.stringsCohereEmbeddingsinitialization to include requireduser_agentparameterTest Results:
Review & Testing Checklist for Human
CohereEmbeddingsandOpenAIEmbeddingsshould be imported from dedicated packages (langchain-cohere,langchain-openai) rather thanlangchain_community. Should we update to use those packages instead? See test output warnings.git grep -r "from langchain\." --include="*.py"Recommended Test Plan
poetry install --all-extrasNotes
user_agent="airbyte-cdk"parameter was added toCohereEmbeddings- consider if this should include version infoSession Info:
Summary by CodeRabbit
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.
Note
Auto-merge may have been disabled. Please check the PR status to confirm.