google-genai: Fix TaskType for embedding queries when using vector store retriever #5939
pranav-kural
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
When using the
GoogleGenerativeAIEmbeddings
class from@langchain/google-genai
to create an instance for embedding model to use with a vector store, we can optionally specify aTaskType
.For the task of generating embeddings for documents of a knowledge base (to store in the vector store), this task would ideally be
RETRIEVAL_DOCUMENT
. Example below:However, when using the vector store retriever, the query being sent should ideally be embedded with the
TaskType
ofRETRIEVAL_QUERY
.Currently, this is not the case.
_embedQueryContent
currently calls_convertToContent
only by supplying it the query text. In the object returned by_convertToContent
, thetaskType
property is set tothis.taskType
. If theTaskType
was provided when the instance ofGoogleGenerativeAIEmbeddings
was created, then that task type gets used, which might be inaccurate. For example, if the instance was configured to use task typeTaskType.RETRIEVAL_DOCUMENT
when created (as in code above), it will use this task type, even though its actually supposed to beTaskType.RETRIEVAL_QUERY
when embedding query text.File: https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-google-genai/src/embeddings.ts
Motivation
Without this feature/fix being implemented, full potential benefit of specifying the
TaskType
parameter for a Google Generative model can't be observed. Instead, it may lead instances of using (knowingly or unknowingly) the wrong value for theTaskType
.Proposal (If applicable)
I would like to suggest the below solution.
Updating the
_convertToContent
method to accept an additional optional argument fortaskType
, and updating the_embedQueryContent
method to sendRETRIEVAL_QUERY
as the task type by default to the_convertToContent
method.File: https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-google-genai/src/embeddings.ts
There is a workaround to this for now (but its a bit tricky to manage):
Beta Was this translation helpful? Give feedback.
All reactions