Avoid RateLimitError by building a queue or adding some time.sleep? #9521

aiwalter · 2023-12-14T09:57:49Z

aiwalter
Dec 14, 2023

Is there a way to build a queue in case a RateLimitError happens so that its like waitin 60 seconds before going on with next batch?
I keep getting embedding rate limit issues and would like to catch them by slowing down the number of tokens that are sent per minute by catching any exception and adding some sleep time

        index = VectorStoreIndex.from_documents(
            documents=documents,
            storage_context=storage_context,
            service_context=service_context,
        )

@aiwalter · 2023-12-14T10:07:10Z

dosubot[bot]
bot Dec 14, 2023

🤖

Hey there, @aiwalter! Good to see you diving into LlamaIndex again. Always a pleasure to see familiar faces around here. Hope you're staying energized for coding!

Based on the similar issues I found in the LlamaIndex repository, there are a couple of ways you can handle the RateLimitError.

One way is to manually embed your nodes one at a time and add a sleep time between each embedding. Here's an example of how you can do this:

from llama_index.embeddings import OpenAIEmbedding
from llama_index.node_parser import SimpleNodeParser

node_parser = SimpleNodeParser.from_defaults()

embed_model = OpenAIEmbedding()

nodes = node_parser.get_nodes_from_documents(documents)

import time
for node in nodes:
  node.embedding = embed_model.get_text_embedding(node.text)
  time.sleep(21)

index = VectorStoreIndex(nodes, ....)

This solution was suggested in the issue [Bug]: Warning raising "llama_index.llms.openai_utils:Retrying llama_index.embeddings.openai.get_embeddings".

Another way is to update the retry policy to get OpenAI Embeddings for trial accounts. This solution was implemented in the pull request Update retry policy to get OpenAI Embeddings for trial accounts which closed the issue How to solve limit rate during load_data ?.

However, please note that rate limit issues might also be due to OpenAI API availability issues, as mentioned in the issue Rate limit issue. In this case, there's not much that can be done on your side. You might want to try using a different model if the issue persists.

I hope this helps! If you have any other questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

aiwalter · 2023-12-14T13:44:39Z

aiwalter
Dec 14, 2023
Author

I built a deco that can support here:

def retry(max_attempts: int = 4, initial_delay: int = 5, exception: Exception = Exception):
    """A decorator that retries a function call with an exponential backoff if it raises a specified exception.

    Parameters
    ----------
    max_attempts: Maximum number of attempts for retrying the function.
    initial_delay: Initial delay between retries, in seconds. This delay is doubled after each failure.
    exception: The exception type that triggers a retry.
    """

    def decorator_retry(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            attempts = 0
            delay = initial_delay
            while attempts < max_attempts:
                try:
                    return func(*args, **kwargs)
                except exception as e:
                    LOGGER.warn(f"Retry required, attempts {attempts}: {func.__name__}")
                    attempts += 1
                    if attempts >= max_attempts:
                        raise
                    time.sleep(delay)
                    delay *= 2  # Exponential backoff

        return wrapper

    return decorator_retry

Then:

@retry(exception=openai.RateLimitError)
def _embed_node(node: Node, embed_model: BaseEmbedding) -> Node:
    """Retry helper function."""
    node.embedding = embed_model.get_text_embedding(node.text)
    return node

1 reply

fredrikburmester May 28, 2024

@aiwalter That's cool! Can you show how to implement this in a basic workflow like this?

pipeline = IngestionPipeline(transformations=[node_parser, title_extractor, qa_extractor])

nodes = pipeline.run(documents=documents)

index = VectorStoreIndex(nodes=nodes)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid RateLimitError by building a queue or adding some time.sleep? #9521

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Avoid RateLimitError by building a queue or adding some time.sleep? #9521

Uh oh!

aiwalter Dec 14, 2023

Replies: 2 comments · 1 reply

Uh oh!

dosubot[bot] bot Dec 14, 2023

Sources

Uh oh!

aiwalter Dec 14, 2023 Author

Uh oh!

fredrikburmester May 28, 2024

aiwalter
Dec 14, 2023

Replies: 2 comments 1 reply

dosubot[bot]
bot Dec 14, 2023

aiwalter
Dec 14, 2023
Author