Skip to content

[BUG] Cheshire Cat cannot launch if LLM quota is exceeded #1107

@Gabbosaur

Description

@Gabbosaur

Bug Description
If your LLM provider returns a 429 error because the quota has been exceeded, the Cheshire Cat container fails to start. During initialization the app hits the model error and immediately shuts down with the message: "Application startup failed. Exiting."

Steps to Reproduce

  1. Use an LLM/Embedder API key that has already exceeded its quota.
  2. Run docker compose down.
  3. Run docker compose up.
  4. Check the logs.

Expected Behavior
The container should start normally even if the provider returns a 429, and it should handle the error gracefully without shutting down the entire application.

Additional context
Logs:

[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: pip install --upgrade pip
DEBUG:  Import module cat.plugins.influ_b2b.tools
DEBUG:  Import module cat.plugins.influ_b2b.custom-api
DEBUG:  Import module cat.plugins.cat_advanced_tools.settings
DEBUG:  Import module cat.plugins.cat_advanced_tools.fast_setup
DEBUG:  Executing core_plugin::factory_allowed_auth_handlers with priority 0
DEBUG:  Initializing WhiteRabbit...
DEBUG:  WhiteRabbit: Starting scheduler
[INFO] Scheduler started
DEBUG:  WhiteRabbit: Scheduler started
DEBUG:  Executing core_plugin::before_cat_bootstrap with priority 0
[DEBUG] Looking for jobs to run
DEBUG:  Executing core_plugin::factory_allowed_llms with priority 0
[DEBUG] No jobs; waiting until a job is added
[DEBUG] Using AsyncIOEngine.POLLER as I/O engine
DEBUG:  Executing core_plugin::factory_allowed_embedders with priority 0
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/langchain_google_genai/embeddings.py", line 225, in embed_documents
    result = self.client.batch_embed_contents(
  File "/usr/local/lib/python3.10/site-packages/google/ai/generativelanguage_v1beta/services/generative_service/client.py", line 1365, in batch_embed_contents
    response = rpc(
  File "/usr/local/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/google/api_core/retry/retry_unary.py", line 294, in retry_wrapped_func
    return retry_target(
  File "/usr/local/lib/python3.10/site-packages/google/api_core/retry/retry_unary.py", line 156, in retry_target
    next_sleep = _retry_error_helper(
  File "/usr/local/lib/python3.10/site-packages/google/api_core/retry/retry_base.py", line 214, in _retry_error_helper
    raise final_exc from source_exc
  File "/usr/local/lib/python3.10/site-packages/google/api_core/retry/retry_unary.py", line 147, in retry_target
    result = target()
  File "/usr/local/lib/python3.10/site-packages/google/api_core/timeout.py", line 130, in func_with_timeout
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ResourceExhausted: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0 [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerDayPerProjectPerModel-FreeTier"
}
violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerMinutePerProjectPerModel-FreeTier"
}
violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerMinutePerUserPerProjectPerModel-FreeTier"
}
violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerDayPerUserPerProjectPerModel-FreeTier"
}
]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 133, in merged_lifespan
    async with original_context(app) as maybe_original_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/app/cat/startup.py", line 42, in lifespan
    app.state.ccat = CheshireCat(cheshire_cat_api)
  File "/app/cat/utils.py", line 325, in getinstance
    cls.instances[class_] = class_(*args, **kwargs)
  File "/app/cat/looking_glass/cheshire_cat.py", line 88, in __init__
    self.load_memory()
  File "/app/cat/looking_glass/cheshire_cat.py", line 290, in load_memory
    embedder_size = len(self.embedder.embed_query("hello world"))
  File "/usr/local/lib/python3.10/site-packages/langchain_google_genai/embeddings.py", line 254, in embed_query
    return self.embed_documents(
  File "/usr/local/lib/python3.10/site-packages/langchain_google_genai/embeddings.py", line 229, in embed_documents
    raise GoogleGenerativeAIError(f"Error embedding content: {e}") from e
langchain_google_genai._common.GoogleGenerativeAIError: Error embedding content: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0
* Quota exceeded for metric: generativelanguage.googleapis.com/embed_content_free_tier_requests, limit: 0 [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerDayPerProjectPerModel-FreeTier"
}
violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerMinutePerProjectPerModel-FreeTier"
}
violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerMinutePerUserPerProjectPerModel-FreeTier"
}
violations {
  quota_metric: "generativelanguage.googleapis.com/embed_content_free_tier_requests"
  quota_id: "EmbedContentRequestsPerDayPerUserPerProjectPerModel-FreeTier"
}
]

ERROR:    Application startup failed. Exiting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions