-
Notifications
You must be signed in to change notification settings - Fork 386
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Description
If your LLM provider returns a 429 error because the quota has been exceeded, the Cheshire Cat container fails to start. During initialization the app hits the model error and immediately shuts down with the message: "Application startup failed. Exiting."
Steps to Reproduce
- Use an LLM/Embedder API key that has already exceeded its quota.
- Run
docker compose down. - Run
docker compose up. - Check the logs.
Expected Behavior
The container should start normally even if the provider returns a 429, and it should handle the error gracefully without shutting down the entire application.
Additional context
Logs:
[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: pip install --upgrade pip
DEBUG: Import module cat.plugins.influ_b2b.tools
DEBUG: Import module cat.plugins.influ_b2b.custom-api
DEBUG: Import module cat.plugins.cat_advanced_tools.settings
DEBUG: Import module cat.plugins.cat_advanced_tools.fast_setup
DEBUG: Executing core_plugin::factory_allowed_auth_handlers with priority 0
DEBUG: Initializing WhiteRabbit...
DEBUG: WhiteRabbit: Starting scheduler
[INFO] Scheduler started
DEBUG: WhiteRabbit: Scheduler started
DEBUG: Executing core_plugin::before_cat_bootstrap with priority 0
[DEBUG] Looking for jobs to run
DEBUG: Executing core_plugin::factory_allowed_llms with priority 0
[DEBUG] No jobs; waiting until a job is added
[DEBUG] Using AsyncIOEngine.POLLER as I/O engine
DEBUG: Executing core_plugin::factory_allowed_embedders with priority 0
ERROR: Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/langchain_google_genai/embeddings.py", line 225, in embed_documents
result = self.client.batch_embed_contents(
File "/usr/local/lib/python3.10/site-packages/google/ai/generativelanguage_v1beta/services/generative_service/client.py", line 1365, in batch_embed_contents
response = rpc(
File "/usr/local/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/google/api_core/retry/retry_unary.py", line 294, in retry_wrapped_func
return retry_target(
File "/usr/local/lib/python3.10/site-packages/google/api_core/retry/retry_unary.py", line 156, in retry_target
next_sleep = _retry_error_helper(
File "/usr/local/lib/python3.10/site-packages/google/api_core/retry/retry_base.py", line 214, in _retry_error_helper
raise final_exc from source_exc
File "/usr/local/lib/python3.10/site-packages/google/api_core/retry/retry_unary.py", line 147, in retry_target
result = target()
File "/usr/local/lib/python3.10/site-packages/google/api_core/timeout.py", line 130, in func_with_timeout
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ResourceExhausted: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit.
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0 [links {
description: "Learn more about Gemini API quotas"
url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
quota_id: "EmbedContentRequestsPerDayPerProjectPerModel-FreeTier"
}
violations {
quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
quota_id: "EmbedContentRequestsPerMinutePerProjectPerModel-FreeTier"
}
violations {
quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
quota_id: "EmbedContentRequestsPerMinutePerUserPerProjectPerModel-FreeTier"
}
violations {
quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
quota_id: "EmbedContentRequestsPerDayPerUserPerProjectPerModel-FreeTier"
}
]
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
async with original_context(app) as maybe_original_state:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/app/cat/startup.py", line 42, in lifespan
app.state.ccat = CheshireCat(cheshire_cat_api)
File "/app/cat/utils.py", line 325, in getinstance
cls.instances[class_] = class_(*args, **kwargs)
File "/app/cat/looking_glass/cheshire_cat.py", line 88, in __init__
self.load_memory()
File "/app/cat/looking_glass/cheshire_cat.py", line 290, in load_memory
embedder_size = len(self.embedder.embed_query("hello world"))
File "/usr/local/lib/python3.10/site-packages/langchain_google_genai/embeddings.py", line 254, in embed_query
return self.embed_documents(
File "/usr/local/lib/python3.10/site-packages/langchain_google_genai/embeddings.py", line 229, in embed_documents
raise GoogleGenerativeAIError(f"Error embedding content: {e}") from e
langchain_google_genai._common.GoogleGenerativeAIError: Error embedding content: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit.
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0 [links {
description: "Learn more about Gemini API quotas"
url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
quota_id: "EmbedContentRequestsPerDayPerProjectPerModel-FreeTier"
}
violations {
quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
quota_id: "EmbedContentRequestsPerMinutePerProjectPerModel-FreeTier"
}
violations {
quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
quota_id: "EmbedContentRequestsPerMinutePerUserPerProjectPerModel-FreeTier"
}
violations {
quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
quota_id: "EmbedContentRequestsPerDayPerUserPerProjectPerModel-FreeTier"
}
]
ERROR: Application startup failed. Exiting.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working