Skip to content

Conversation

@regaltsui
Copy link
Contributor

@regaltsui regaltsui commented Mar 11, 2025

What does this PR do?

This PR adds a Langchain Embeddings class as the backend of BERTopic. The Langchain Backend supports LangChain compatable Embeddings, including 3rd party integration.

Github Issue

#2293

Before submitting

  • This PR fixes a typo or improves the docs (if yes, ignore all other checks!).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes (if applicable)?
  • Did you write any new necessary tests?

@MaartenGr
Copy link
Owner

Thanks for the PR!

There was an odd error with the GH-workflow, so I reran the checks. That said, I think we can skip adding LangChain to the pyproject.toml for two reasons. First, there are no tests, so I cannot guarantee that a given version will work. Second, considering the LangChain would have you install additional dependencies anyway (depending on the embedding model), doing pip install bertopic[langchain] wouldn't be enough.

@regaltsui
Copy link
Contributor Author

Sure, I have just removed all changes on the pyproject.toml. Let's see how the GH-workflow goes

@MaartenGr
Copy link
Owner

Awesome, it seems that it is working well. Would you mind adding a small section to the documentation? Above flair would be nice considering most users would want to use sbert/transformers first. It doesn't have to be much, so just the docstrings and a few lines to explain what LangChain is and where to find/install it.

@regaltsui regaltsui force-pushed the add_langchain_embedder branch from 9a89d57 to 9111fb5 Compare March 24, 2025 01:32
@MaartenGr
Copy link
Owner

Thanks for adding the docs, after the conflicts have been resolved and the tests pass then I'll go ahead and merge this.

@regaltsui regaltsui force-pushed the add_langchain_embedder branch from 6479c72 to 37d1721 Compare March 25, 2025 12:46
@regaltsui
Copy link
Contributor Author

I have just rebased the branch as I see FastEmbed has been added recently. Let's see if it's passing the tests.

@MaartenGr
Copy link
Owner

Awesome, thank you for your thoughtful work on this and your quick replies. Great to have this in BERTopic.

@MaartenGr MaartenGr merged commit de250e9 into MaartenGr:master Mar 28, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants