"string indices must be integers, not 'str'" after Successful asimilarity_search Execution #25945

octadion · 2024-09-03T02:22:53Z

octadion
Sep 3, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

class DialoqbaseVectorStore(VectorStore):
    def __init__(self, embeddings: Embeddings, args: DialoqbaseLibArgs):
        self.bot_id = args.bot_id
        self.source_id = args.source_id
        self.embeddings = embeddings

    async def asimilarity_search(
        self, query: str, k: int = 4, filter: Optional[Any] = None, callbacks: Optional[Callbacks] = None
    ) -> List[Document]:
        try:
            results = await self.similarity_search_vector_with_score(
                self.embeddings.embed_query(query), k, filter, query
            )
            print([result[0] for result in results])
            return [result[0] for result in results]
        except Exception as e:
            print("error", e)

Description

Hi everyone,

I'm experiencing an issue with my custom class DialoqbaseVectorStore that extends VectorStore. I implemented an asynchronous method asimilarity_search to perform similarity searches. The method successfully returns the results, but after that, I encounter an error saying, "string indices must be integers, not 'str'".

The results from asimilarity_search look something like this:
[Document(metadata={'loc': {'lines': {'to': 1, 'from': 1}}, 'source': 'text'}, page_content='hai \n\n---\n\nhalo')]

The search seems to work correctly, and I can see the correct output in the logs. However, the error "string indices must be integers, not 'str'" appears right afterward.

Where and how might this result be used that would cause this error?
What might be causing this error given that the search seems to be returning the correct output?

I appreciate any guidance or suggestions you might have. Thank you!

System Info

langchain==0.2.14
langchain-anthropic==0.1.23
langchain-cohere==0.2.3
langchain-community==0.2.12
langchain-core==0.2.35
langchain-experimental==0.0.64
langchain-google-genai==1.0.10
langchain-ollama==0.1.1
langchain-openai==0.1.19
langchain-text-splitters==0.2.2
langchainhub==0.1.21
langfuse==2.44.1
langgraph==0.1.8
langsmith==0.1.106

@dosu · 2024-09-03T02:24:40Z

dosubot[bot]
bot Sep 3, 2024

The error "string indices must be integers, not 'str'" in your asimilarity_search method might be caused by the way you are handling the documents_to_insert list in the add_texts method. Specifically, the issue could be due to the structure of the documents_to_insert list, which is a list of dictionaries where each dictionary has a single key-value pair. When you try to access elements of this list, you might be inadvertently treating a dictionary as if it were a list.

Here's the relevant part of the add_texts method:

documents_to_insert = [
    {
        id: {
            self._text_key: text,
            self._embedding_key: vector,
            self._metadata_key: metadata,
        }
        for id, text, vector, metadata in zip(
            ids, texts, embedded_texts, metadatas
        )
    }
]

This creates a list with a single dictionary, where the dictionary keys are the document IDs. When you later try to access elements of documents_to_insert, you might be treating the dictionary as if it were a list, leading to the "string indices must be integers" error.

To fix this, you should ensure that documents_to_insert is a list of dictionaries, each representing a document to be inserted. Here's a corrected version of the code:

documents_to_insert = [
    {
        self._text_key: text,
        self._embedding_key: vector,
        self._metadata_key: metadata,
    }
    for text, vector, metadata in zip(texts, embedded_texts, metadatas)
]

# Insert in batches
for i in range(0, len(documents_to_insert), batch_size):
    batch = documents_to_insert[i : i + batch_size]
    try:
        result = self._collection.upsert_multi({id: doc for id, doc in zip(ids[i : i + batch_size], batch)})
        if result.all_ok:
            doc_ids.extend(ids[i : i + batch_size])
    except DocumentExistsException as e:
        raise ValueError(f"Document already exists: {e}")

This way, documents_to_insert is a list of dictionaries, and you correctly create a dictionary for upsert_multi with document IDs as keys and document data as values. This should resolve the "string indices must be integers" error.

Additionally, ensure that your similarity_search method correctly parses the search response. Here is an example from the PGVecto_rs class:

def similarity_search_with_score_by_vector(
    self,
    query_vector: List[float],
    k: int = 4,
    distance_func: Literal[
        "sqrt_euclid", "neg_dot_prod", "ned_cos"
    ] = "sqrt_euclid",
    filter: Union[None, Dict[str, Any], Any] = None,
    **kwargs: Any,
) -> List[Tuple[Document, float]]:
    """Return docs most similar to query vector, with its score."""

    from pgvecto_rs.sdk.filters import meta_contains
    distance_func_map = {
        "sqrt_euclid": "<->",
        "neg_dot_prod": "<#>",
        "ned_cos": "<=>",
    }
    if filter is None:
        real_filter = None
    elif isinstance(filter, dict):
        real_filter = meta_contains(filter)
    else:
        real_filter = filter
    results = self._store.search(  # type: ignore[union-attr]
        query_vector,
        distance_func_map[distance_func],
        k,
        filter=real_filter,
    )

    return [
        (
            Document(
                page_content=res[0].text,
                metadata=res[0].meta,
            ),
            res[1],
        )
        for res in results
    ]

Ensure that your similarity_search method correctly handles the response structure to avoid similar issues. For example, in the similarity_search_with_score method from the LangChain Redis vector store, the response is parsed to extract the document content and metadata:

def similarity_search_with_score(
    self,
    query: str,
    k: int = 4,
    filter: Optional[RedisFilterExpression] = None,
    return_metadata: bool = True,
    **kwargs: Any,
) -> List[Tuple[Document, float]]:
    """Run similarity search with **vector distance**.

    The "scores" returned from this function are the raw vector
    distances from the query vector. For similarity scores, use
    ``similarity_search_with_relevance_scores``.
    """
    query_embedding = self._embeddings.embed_query(query)

    redis_query, params_dict = self._prepare_query(
        query_embedding,
        k=k,
        filter=filter,
        with_metadata=return_metadata,
        with_distance=True,
        **kwargs,
    )

    # Perform vector search
    try:
        results = self.client.ft(self.index_name).search(redis_query, params_dict)  # type: ignore
    except redis.exceptions.ResponseError as e:
        if str(e).split(" ")[0] == "Syntax":
            raise ValueError(
                "Query failed with syntax error. "
                + "This is likely due to malformation of "
                + "filter, vector, or query argument"
            ) from e
        raise e

    # Prepare document results
    docs = []
    for result in results.docs:
        metadata = {}
        if return_metadata:
            metadata = {"id": result.id}
            metadata.update(self._collect_metadata(result))

        content_key = self._schema.content_key
        docs.append(
            Document(page_content=getattr(result, content_key), metadata=metadata)
        )
    return docs

Ensure that your similarity_search method correctly handles the response structure to avoid similar issues ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

4 replies

octadion Sep 3, 2024
Author

@dosu
After obtaining the results from asimilarity_search, which function is executed next? Which file?
From what I found, the result format [(Document(metadata={'loc': {'lines': {'to': 1, 'from': 1}}, 'source': 'text-cm0agxyxe0005u6q5re8l4c9s'}, page_content='hai \n\n---\n\nhalo'), 0.7616697237885327)] should be correct, but I'm still encountering the error 'string indices must be integer, not str'.

dosubot[bot] bot Sep 3, 2024

After obtaining the results from asimilarity_search, the next function that might be executed is _aget_relevant_documents. This is indicated by the context where asimilarity_search is used within the _aget_relevant_documents function.

The error "string indices must be integers, not 'str'" suggests that somewhere in the code, a string is being accessed using a string key instead of an integer index. This could be occurring in the file libs/community/langchain_community/vectorstores/azuresearch.py.

Here is the relevant part of the code:

async def _aget_relevant_documents(
    self,
    query: str,
    *,
    run_manager: AsyncCallbackManagerForRetrieverRun,
    **kwargs: Any,
) -> List[Document]:
    params = {**self.search_kwargs, **kwargs}
    if self.search_type == "similarity":
        docs = await self.vectorstore.avector_search(query, k=self.k, **params)
    elif self.search_type == "similarity_score_threshold":
        docs_and_scores = (
            await self.vectorstore.asimilarity_search_with_relevance_scores(
                query, k=self.k, **params
            )
        )
        docs = [doc for doc, _ in docs_and_scores]
    elif self.search_type == "hybrid":
        docs = await self.vectorstore.ahybrid_search(query, k=self.k, **params)
    elif self.search_type == "hybrid_score_threshold":
        docs_and_scores = (
            await self.vectorstore.ahybrid_search_with_relevance_scores(
                query, k=self.k, **params
            )
        )
        docs = [doc for doc, _ in docs_and_scores]
    elif self.search_type == "semantic_hybrid":
        docs = await self.vectorstore.asemantic_hybrid_search(
            query, k=self.k, **params
        )
    elif self.search_type == "semantic_hybrid_score_threshold":
        docs = [
            doc
            for doc, _ in await self.vectorstore.asemantic_hybrid_search_with_score(
                query, k=self.k, **params
            )
        ]
    else:
        raise ValueError(f"search_type of {self.search_type} not allowed.")
    return docs

This function calls various asynchronous search methods, including asimilarity_search_with_relevance_scores, ahybrid_search, and asemantic_hybrid_search_with_score, which could be the source of the error.

Additionally, the file libs/partners/qdrant/langchain_qdrant/vectorstores.py provides an example of how to handle the retrieval and formatting of documents, which might help in debugging the issue. Specifically, the asimilarity_search function in this file shows how to properly format the results into Document objects, ensuring that string indices are correctly handled:

@sync_call_fallback
async def asimilarity_search(
    self,
    query: str,
    k: int = 4,
    filter: Optional[MetadataFilter] = None,
    **kwargs: Any,
) -> List[Document]:
    """Return docs most similar to query.
    Args:
        query: Text to look up documents similar to.
        k: Number of Documents to return. Defaults to 4.
        filter: Filter by metadata. Defaults to None.
    Returns:
        List of Documents most similar to the query.
    """
    results = await self.asimilarity_search_with_score(query, k, filter, **kwargs)
    return list(map(itemgetter(0), results))

This example demonstrates how to properly handle and format the results, which might help in identifying and fixing the error in your code ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

octadion Sep 3, 2024
Author

@dosu

Yes, I know that asimilarity_search comes from aget_relevant_documents, but where does it go next? I've tried returning list(map(itemgetter(0), results)), but I'm still encountering the same error

dosubot[bot] bot Sep 3, 2024

After obtaining the results from asimilarity_search in the _aget_relevant_documents function, the next function likely executed is _aresults_to_documents. This function converts the search results into a list of tuples containing Document objects and their corresponding scores. The error "string indices must be integers, not 'str'" could be caused by these functions if they incorrectly access elements in the results dictionary.

Here is the relevant code snippet:

async def _aresults_to_documents(
    results: AsyncSearchItemPaged[Dict],
) -> List[Tuple[Document, float]]:
    docs = [
        (
            _result_to_document(result),
            float(result["@search.score"]),
        )
        async for result in results
    ]
    return docs

The error might occur if result["@search.score"] is not correctly accessed, possibly due to result not being a dictionary or not containing the expected keys ^[1].

Additionally, ensure that the structure of the search results matches the expected format. For example, in the ElasticSearchBM25Retriever class, the _get_relevant_documents method processes the search results and converts them into Document objects. If the structure of the search results does not match the expected format, this could also lead to the "string indices must be integers, not 'str'" error:

def _get_relevant_documents(
    self, query: str, *, run_manager: CallbackManagerForRetrieverRun
) -> List[Document]:
    query_dict = {"query": {"match": {"content": query}}}
    res = self.client.search(index=self.index_name, body=query_dict)

    docs = []
    for r in res["hits"]["hits"]:
        docs.append(Document(page_content=r["_source"]["content"]))
    return docs

Ensure that res["hits"]["hits"] and r["_source"]["content"] are correctly accessed and contain the expected data ^[2].

To continue talking to Dosu, mention @dosu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"string indices must be integers, not 'str'" after Successful asimilarity_search Execution #25945

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

"string indices must be integers, not 'str'" after Successful asimilarity_search Execution #25945

Uh oh!

octadion Sep 3, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment · 4 replies

Uh oh!

dosubot[bot] bot Sep 3, 2024

Uh oh!

octadion Sep 3, 2024 Author

Uh oh!

dosubot[bot] bot Sep 3, 2024

Uh oh!

octadion Sep 3, 2024 Author

Uh oh!

dosubot[bot] bot Sep 3, 2024

octadion
Sep 3, 2024

Replies: 1 comment 4 replies

dosubot[bot]
bot Sep 3, 2024

octadion Sep 3, 2024
Author

octadion Sep 3, 2024
Author