-
Notifications
You must be signed in to change notification settings - Fork 376
Remove dependency on deprecated langchain apis #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
refactor how chain prompt is handled
|
@shengbo-ma Thanks for the PR!
That's correct, like the other LLMs it is merely used as the main model to parse the documents and/or keywords according the 5 principles here.
The intent is that users do not need to have a basic understanding of any LLM provider/framework but just can throw in any model (GGUF or otherwise). LangChain is a tricky library to be familiar with considering the quick changes in their API over the years. As such, I would prefer the most straightforward implementation so that users do not have to learn a framework. Compared to the LangChain implementation in BERTopic, I believe the keyword extraction use case is much simpler.
It is actually to support an LLM model considering many users encounter LangChain as one of the first techniques to load a model (rather than llama-cpp-python, ollama, vllm, etc.). That said, I would actually prefer to keep the
Yes, LangChain needs to be installed beforehand, but it does not need to be included in the pyproject.toml considering there are currently no tests for LangChain. It will give an error if it is not installed: https://github.com/MaartenGr/KeyBERT/blob/master/keybert/llm/__init__.py |
|
@MaartenGr Thanks for the info. Will change this PR accordingly.
Will match what BERTopic does for langchain.
Agree. I don't think load_qa_chain is necessary here. A simple and straight forward chain like in this PR should be good enough. Thus one less dependency from langchain.
Got it. Will restore the prompt arg in next commits. |
@MaartenGr Thanks for this clarification. It inspires me to simplify the user experience as below. Please let me know if it makes sense. Updates
The new implementation from langchain_openai import ChatOpenAI
from keybert import KeyLLM
from keybert.llm import LangChain
_llm = ChatOpenAI(
model="gpt-4o",
api_key="my-openai-api-key",
temperature=0,
)
# Create your LLM
llm = LangChain(_llm)
# Load it in KeyLLM
kw_model = KeyLLM(llm)
# Extract keywords
docs = [
"KeyBERT: A minimal method for keyword extraction with BERT. The keyword extraction is done by finding the sub-phrases in a document that are the most similar to the document itself. First, document embeddings are extracted with BERT to get a document-level representation. Then, word embeddings are extracted for N-gram words/phrases. Finally, we use cosine similarity to find the words/phrases that are the most similar to the document. The most similar words could then be identified as the words that best describe the entire document.",
"KeyLLM: A minimal method for keyword extraction with Large Language Models (LLM). The keyword extraction is done by simply asking the LLM to extract a number of keywords from a single piece of text.",
]
keywords = kw_model.extract_keywords(docs=docs)
print("with no candidates")
print(keywords)
# Output:
# [
# ['KeyBERT', 'keyword extraction', 'BERT', 'document embeddings', 'word embeddings', 'N-gram phrases', 'cosine similarity', 'document representation'],
# ['KeyLLM', 'keyword extraction', 'Large Language Models', 'LLM', 'minimal method']
# ]
# fine tune with candidate keywords
candidates = [
["keyword extraction", "Large Language Models", "LLM", "BERT", "transformer", "embeddings"],
["keyword extraction", "Large Language Models", "LLM", "BERT", "transformer", "embeddings"],
]
keywords = kw_model.extract_keywords(docs=docs, candidate_keywords=candidates)
print("with candidates")
print(keywords)
# Output:
# [
# ['keyword extraction', 'BERT', 'document embeddings', 'word embeddings', 'cosine similarity', 'N-gram phrases'],
# ['KeyLLM', 'keyword extraction', 'Large Language Models', 'LLM']
# ] |
|
This PR is ready for review @MaartenGr |
MaartenGr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left the tiniest of comments, but otherwise it looks great!
| """NOTE | ||
| langchain >= 0.1 is required. Which supports: | ||
| - chain.invoke() | ||
| - LangChain Expression Language (LCEL) is used and it is not compatible with langchain < 0.1. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be best to also update this in the documentation here: https://github.com/MaartenGr/KeyBERT/blob/master/docs/guides/llms.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Updated
|
@MaartenGr
Despite of that, it still would be better to include it explicitly in |
|
Thanks for the initiative! There's an edge case for this, namely that it is possible to install KeyBERT without As such, it will throw an error when using this functionality. I would prefer not using this additional dependency if it is not absolutely needed for the core functionality of KeyBERT (which includes this light-weight installation option). |
@MaartenGr Yeah make sense. |
|
Awesome, thank you for the work on this. Everything looks in order! |
Issue: #259
Description:
.run()with.invoke()load_qa_chainwith a simple chain of langchainI did not add a test for this change, since I don't see langchain installed in KeyBERT test env, so posted the test below (Same as in the
keybert.llm.LangChaindocsting)New user experience, also how to test this change
Discussion
@MaartenGr Please review these concerns below.
While implementing this PR, I feel it a bit hard to identify what keybert wants to provide on top of langchain, so please review if the new user experience above makes sense. My understanding is that
keybert.llm.LangChainprovideskeybert.llm.LangChainis to support a chain, not a LLM model.However, a chain can be so flexible, containing prompts, llm, output formats / parsers, etc, that it is very tricky trying to change the prompt template of a given chain inside our own
keybert.llm.LangChainby taking a prompt arg. So in this PR, I am suggesting users to useLangChain.DEFAULT_PROMPT_TEMPLATEwhen they create the chain. And prompt arg is removed fromkeybert.llm.LangChain.__init__.I see
keybert.llm.LangChainimports langchain in source code this line, but does not list it as a dependency inpyproject.toml. Are we assuming beforekeybert.llm.LangChainis called, user must have installed langchain since they have to write their only chain?