Add Hygraph circuit breaker client and caching#108
Conversation
|
Caution Review failedThe pull request is closed. Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughImplements a resilience layer for Hygraph API integration through a circuit breaker pattern, Redis-backed caching with in-memory fallback, and updated service methods to support fallback data retrieval and error handling during service degradation. Changes
Sequence DiagramssequenceDiagram
participant Client
participant Routes as routes_sync.py
participant Breaker as CircuitBreaker
participant HygraphClient as HygraphClient
participant Cache as RedisCache
participant Hygraph as Hygraph API
Client->>Routes: Pull Request
Routes->>Breaker: call(execute)
alt Circuit CLOSED
Breaker->>HygraphClient: execute(query)
HygraphClient->>Hygraph: HTTP POST
Hygraph->>HygraphClient: Response
HygraphClient->>Cache: set_json(data, ttl)
HygraphClient->>Breaker: Return data
Breaker->>Routes: Return data
else Circuit OPEN
Breaker->>Routes: CircuitOpenError(fallback_result)
end
alt Has fallback_result
Routes->>Cache: Return cached data + from_cache=true
else No fallback_result
Routes->>Client: 503 Service Unavailable
end
sequenceDiagram
participant Cache as RedisCache
participant Redis as Redis
participant Memory as In-Memory Store
participant Consumer as Consumer
Consumer->>Cache: get_json(key)
alt Redis available
Cache->>Redis: GET prefixed_key
alt Found & Valid JSON
Redis->>Cache: Return value
Cache->>Consumer: Return parsed value
else Redis error/no data
Cache->>Memory: Check expiry & fetch
Memory->>Cache: Return value or None
Cache->>Consumer: Return value
end
else Redis unavailable
Cache->>Memory: Check expiry & fetch
Memory->>Cache: Return value or None
Cache->>Consumer: Return value
end
Consumer->>Cache: set_json(key, value, ttl)
Cache->>Redis: SET with EX
alt Redis write success
Cache->>Consumer: Stored in Redis
else Redis write failed or fallback enabled
Cache->>Memory: Store with expiry
Cache->>Consumer: Stored in memory
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes The PR introduces multiple new service abstractions (circuit breaker, cache, Hygraph client) with stateful logic and integrates them across route handlers and service layers. While individual components follow clear patterns, understanding their interactions, state transitions, TTL/expiry mechanics, and error propagation paths requires careful analysis of each module and their integration points. Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting
| try: | ||
| result = func(*args, **kwargs) | ||
| except Exception: | ||
| self._record_failure() | ||
| raise |
There was a problem hiding this comment.
Raise fallback when half-open call still fails
When the breaker has been open long enough _ready_for_half_open() flips it to HALF_OPEN and the guarded function is executed directly. If that call fails (httpx error, timeout, etc.) the except block simply re-raises the original exception after reopening the circuit, never invoking the supplied fallback or wrapping the failure in CircuitOpenError. The admin route only serves cached data when it receives CircuitOpenError, so a request during this half-open probe returns a 500 with no fallback even if a warm cache is available and the circuit immediately re-opens. The half-open failure path should propagate a CircuitOpenError (including the fallback result) so cached responses are served consistently until the dependency recovers.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 9
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (9)
backend/api/config.py(1 hunks)backend/api/metrics.py(1 hunks)backend/api/routes_sync.py(2 hunks)backend/pyproject.toml(1 hunks)backend/services/cache.py(1 hunks)backend/services/circuit_breaker.py(1 hunks)backend/services/hygraph_client.py(1 hunks)backend/services/hygraph_service.py(2 hunks)docs/hygraph-runbook.md(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
backend/api/routes_sync.py (2)
backend/services/hygraph_service.py (1)
HygraphService(34-191)backend/services/circuit_breaker.py (1)
CircuitOpenError(13-18)
backend/services/hygraph_client.py (1)
backend/services/circuit_breaker.py (3)
CircuitBreaker(22-104)CircuitOpenError(13-18)call(81-104)
backend/services/hygraph_service.py (4)
backend/api/config.py (1)
get_settings(70-73)backend/services/cache.py (3)
RedisCache(17-73)get_json(36-57)set_json(59-73)backend/services/circuit_breaker.py (1)
CircuitOpenError(13-18)backend/services/hygraph_client.py (3)
HygraphClient(19-77)HygraphGraphQLError(11-16)execute(46-74)
🪛 GitHub Actions: CI
docs/hygraph-runbook.md
[error] One or more static type checks (mypy) and lint checks failed in pre-commit suite.
backend/api/metrics.py
[error] One or more static type checks (mypy) and lint checks failed in pre-commit suite.
backend/api/routes_sync.py
[error] 113-113: Incompatible return value type: JSONResponse is not a dict[str, Any] (myPy return type mismatch).
[error] 174-174: Type of counts variable used in inc() call does not match expected type per mypy (line too long due to formatting).
[error] 203-203: Exception handler path returns a dict vs JSONResponse in typing; mypy reports type mismatch.
[error] One or more static type checks (mypy) and lint checks failed in pre-commit suite.
backend/services/circuit_breaker.py
[error] One or more static type checks (mypy) and lint checks failed in pre-commit suite.
backend/api/config.py
[error] One or more static type checks (mypy) and lint checks failed in pre-commit suite.
backend/services/hygraph_client.py
[error] One or more static type checks (mypy) and lint checks failed in pre-commit suite.
backend/pyproject.toml
[error] One or more static type checks (mypy) and lint checks failed in pre-commit suite.
backend/services/hygraph_service.py
[error] 138-138: Function is missing a type annotation for one or more arguments (no-untyped-def).
[error] One or more static type checks (mypy) and lint checks failed in pre-commit suite.
backend/services/cache.py
[error] One or more static type checks (mypy) and lint checks failed in pre-commit suite.
🪛 Gitleaks (8.28.0)
docs/hygraph-runbook.md
[high] 9-15: Discovered a potential basic authorization token provided in a curl command, which could compromise the curl accessed resource.
(curl-auth-user)
🪛 LanguageTool
docs/hygraph-runbook.md
[grammar] ~1-~1: Use correct spacing
Context: # Hygraph Circuit Breaker Runbook This runbook documents the operational s...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~3-~3: Consider adjusting word choice based on regional variation
Context: ...g warming the cache, validating breaker behaviour, and observing the related Prometheus m...
(QB_NEW_EN_OTHER_ERROR_IDS_26)
[grammar] ~3-~3: Use correct spacing
Context: ...bserving the related Prometheus metrics. ## Prerequisites * Backend application run...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~5-~5: Use correct spacing
Context: ...ed Prometheus metrics. ## Prerequisites * Backend application running with access ...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~9-~9: Use correct spacing
Context: ...ured via environment variables. * curl (or similar HTTP client) available for m...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~9-~9: Use correct spacing
Context: ...TTP client) available for manual checks. ## Warming the Cache 1. Ensure Redis is re...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~11-~11: Use correct spacing
Context: ...for manual checks. ## Warming the Cache 1. Ensure Redis is reachable: ```bash ...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~13-~13: Use correct spacing
Context: ...the Cache 1. Ensure Redis is reachable: bash redis-cli -u "${REDIS_URL:-redis://localhost:6379/0}" ping 2. Trigger a pull for each catalog type so ...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~17-~17: Use correct spacing
Context: ... that fresh data is stored in the cache: bash AUTH_HEADER="Authorization: Bearer ${API_WRITE_TOKEN}" curl -X POST http://localhost:8000/api/sync/hygraph/pull \ -H "Content-Type: application/json" \ -H "$AUTH_HEADER" \ -d '{"type": "materials"}' curl -X POST http://localhost:8000/api/sync/hygraph/pull \ -H "Content-Type: application/json" \ -H "$AUTH_HEADER" \ -d '{"type": "modules"}' curl -X POST http://localhost:8000/api/sync/hygraph/pull \ -H "Content-Type: application/json" \ -H "$AUTH_HEADER" \ -d '{"type": "systems"}' 3. Verify cached responses by repeating one...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~33-~33: Use correct spacing
Context: ...s to true when the fallback is served. ## Exercising Breaker States 1. **Closed →...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~35-~35: Use correct spacing
Context: ...is served. ## Exercising Breaker States 1. Closed → Open transition: Temporarily ...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~37-~37: There might be a mistake here.
Context: ...mporarily simulate Hygraph failure (for example by pointing HYGRAPH_ENDPOINT to `http...
(QB_NEW_EN_OTHER)
[grammar] ~37-~37: There might be a mistake here.
Context: ...ure threshold, the response should be a 503 Service Unavailable and hygraph_cb_trips_total will incre...
(QB_NEW_EN_OTHER)
[grammar] ~37-~37: There might be a mistake here.
Context: ...hygraph_cb_trips_total will increment. 2. Open fallback: With the endpoint still...
(QB_NEW_EN)
[grammar] ~39-~39: Use correct spacing
Context: ...eturn live data ("from_cache": false). ## Observability * Scrape /metrics to co...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~41-~41: Use correct spacing
Context: ..."from_cache": false). ## Observability * Scrape /metrics` to confirm the followi...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~43-~43: Use correct spacing
Context: ...ricsto confirm the following counters: *hygraph_cb_trips_total` – total number of times the circuit has ...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~44-~44: There might be a problem here.
Context: ... number of times the circuit has opened. * hygraph_cb_fallback_total – count of responses served from the cac...
(QB_NEW_EN_MERGED_MATCH)
[grammar] ~45-~45: There might be a mistake here.
Context: ...sponses served from the cached fallback. * Existing sync counters (`sync_success_to...
(QB_NEW_EN)
[grammar] ~46-~46: Use correct spacing
Context: ...l) continue to track per-type outcomes. * Background webhook synchronisation (POS...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~48-~48: Consider adjusting word choice based on regional variation
Context: ...er-type outcomes. * Background webhook synchronisation (POST /api/sync/hygraph) surfaces ide...
(QB_NEW_EN_OTHER_ERROR_IDS_26)
[grammar] ~48-~48: Consider adjusting word choice based on regional variation
Context: ...nc/hygraph`) surfaces identical breaker behaviour in logs and metrics when the admin pull...
(QB_NEW_EN_OTHER_ERROR_IDS_26)
[grammar] ~48-~48: Use correct spacing
Context: ...the admin pull uses the cached fallback. Keep the cache warm before scheduled mai...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
🪛 markdownlint-cli2 (0.18.1)
docs/hygraph-runbook.md
14-14: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
16-16: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
18-18: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
32-32: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🔇 Additional comments (8)
backend/api/config.py (1)
45-48: LGTM!The redis_url configuration field is well-defined with a sensible default for local development and a clear description.
backend/api/routes_sync.py (1)
191-200: Circuit breaker error handling is well-structured.The error handling correctly:
- Increments the
hygraph_cb_trips_totalmetric when the circuit opens- Returns cached data with appropriate flags when a fallback is available
- Returns a 503 Service Unavailable response when no fallback exists
The logic aligns well with the circuit breaker pattern and provides graceful degradation.
Note: The pipeline failures indicate pre-existing type annotation issues on lines 113, 174, and 203 that are unrelated to your circuit breaker changes. These should be addressed separately.
backend/api/metrics.py (1)
23-31: LGTM!The new Prometheus metrics are well-defined:
hygraph_cb_trips_totaltracks circuit breaker open eventshygraph_cb_fallback_totaltracks requests served from cache fallback- Both follow Prometheus naming conventions and have clear descriptions
These metrics will provide valuable observability for the circuit breaker behavior.
docs/hygraph-runbook.md (1)
1-50: Well-structured operational runbook.The documentation provides clear, actionable guidance for:
- Cache warming procedures
- Circuit breaker state transitions and validation
- Observability and metrics interpretation
- Production readiness considerations
The use of environment variables (e.g.,
${API_WRITE_TOKEN}) in examples is the correct approach for demonstrating secure practices.Note: The Gitleaks alert on lines 9-15 is a false positive. The runbook correctly demonstrates using environment variables rather than hardcoding credentials.
backend/services/hygraph_service.py (2)
94-133: Well-designed caching and fallback strategy.The
_fetch_collectionmethod correctly:
- Checks for cached data before making external calls
- Prepares a fallback closure if cached data exists
- Uses the circuit breaker to guard external calls
- Propagates circuit breaker errors for handling by callers
- Caches successful responses with TTL
The use of
asyncio.to_threadto wrap the synchronous client is appropriate for the async context.
76-92: Robust GraphQL response normalization.The
_extract_itemshelper correctly handles multiple GraphQL response shapes:
- Direct arrays
- Relay-style connections with
nodes- Relay-style connections with
edgescontainingnodeThis defensive approach prevents surprises from different Hygraph schema patterns.
backend/services/hygraph_client.py (2)
22-40: Well-designed client initialization with good defaults.The constructor correctly:
- Validates the endpoint is non-empty (line 31-32)
- Creates or accepts an httpx.Client with timeout (line 35)
- Provides a default CircuitBreaker instance (line 36)
- Sets appropriate headers including authorization (lines 37-40)
The use of
setdefaultensures headers aren't overwritten if the client is passed in with pre-configured headers.
46-74: Circuit breaker integration is correctly implemented.The
executemethod properly:
- Defines an inner
_requestfunction for the circuit breaker to wrap- Validates HTTP status with
raise_for_status()(line 60)- Checks for GraphQL errors and raises a custom exception (lines 62-64)
- Returns data or an empty dict fallback (lines 65-68)
- Wraps the call with the circuit breaker, passing the optional fallback (line 71)
- Re-raises
CircuitOpenErrorto allow callers to handle fallback semantics (lines 72-74)The synchronous httpx.Client usage is appropriate since the service layer wraps calls with
asyncio.to_thread.
backend/pyproject.toml
Outdated
| "httpx>=0.27.2", | ||
| "redis>=5.2.1", |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
Consider updating dependency versions to latest stable releases.
The specified minimum versions are older than the latest stable releases available:
- httpx: specified >=0.27.2, but 0.28.1 is the latest stable release with bug fixes (notably SSL verify=False with client certificates)
- redis: specified >=5.2.1, but 6.4.0 is the latest stable release
Based on learnings
While the >= operator provides flexibility, consider:
- Testing with the latest versions and updating the minimums to avoid known issues
- Using more restrictive bounds (e.g.,
>=0.28.0,<0.29) if you want to avoid potential breaking changes in future major releases
To check for security advisories or known issues with the specified versions, run:
🏁 Script executed:
#!/bin/bash
# Check for security advisories on the pinned package versions
pip-audit --desc -r <(echo -e "httpx>=0.27.2\nredis>=5.2.1")Length of output: 47
🏁 Script executed:
#!/bin/bash
python3 - << 'EOF'
import json, urllib.request
resp = urllib.request.urlopen("https://pypi.org/pypi/httpx/json")
print("httpx latest:", json.load(resp)["info"]["version"])
resp = urllib.request.urlopen("https://pypi.org/pypi/redis/json")
print("redis latest:", json.load(resp)["info"]["version"])
EOFLength of output: 2581
🏁 Script executed:
#!/bin/bash
python3 << 'EOF'
import ssl, json, urllib.request
ctx = ssl._create_unverified_context()
for pkg in ["httpx", "redis"]:
url = f"https://pypi.org/pypi/{pkg}/json"
data = urllib.request.urlopen(url, context=ctx).read()
print(f"{pkg} latest: {json.loads(data)['info']['version']}")
EOFLength of output: 40
Update minimum dependency versions to latest stable releases.
- httpx: bump
>=0.27.2→>=0.28.1 - redis: bump
>=5.2.1→>=6.4.0
Consider adding upper bounds (e.g., <0.29, <7.0) to guard against future breaking changes. Test compatibility with the updated versions and run a vulnerability scan before merging.
🤖 Prompt for AI Agents
In backend/pyproject.toml around lines 18 to 19, the pinned minimum versions for
httpx and redis are outdated; update "httpx>=0.27.2" to "httpx>=0.28.1" and
"redis>=5.2.1" to "redis>=6.4.0", and optionally add conservative upper bounds
like "<0.29" and "<7.0" to each requirement to guard against breaking changes;
after changing the versions run your test suite and a dependency vulnerability
scan to verify compatibility before merging.
| try: | ||
| from redis import asyncio as redis_asyncio | ||
| except Exception: # pragma: no cover - optional dependency guard | ||
| redis_asyncio = None # type: ignore[assignment] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Narrow the exception handler scope.
The bare Exception catch is too broad and could mask import errors that should be debugged. Consider catching specific exceptions like ImportError or ModuleNotFoundError for clearer intent.
Apply this diff:
try:
from redis import asyncio as redis_asyncio
-except Exception: # pragma: no cover - optional dependency guard
+except (ImportError, ModuleNotFoundError): # pragma: no cover - optional dependency guard
redis_asyncio = None # type: ignore[assignment]🤖 Prompt for AI Agents
In backend/services/cache.py around lines 10 to 13, the try/except currently
catches all Exceptions; narrow it to only import-related errors by catching
ImportError or ModuleNotFoundError (e.g., except (ImportError,
ModuleNotFoundError):) to avoid masking other runtime errors, preserve the
pragma no cover and the redis_asyncio = None # type: ignore[assignment]
fallback.
| def __post_init__(self) -> None: | ||
| if redis_asyncio is not None: | ||
| try: | ||
| self._client = redis_asyncio.from_url(self.url, decode_responses=True) | ||
| except Exception: | ||
| self._client = None |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add logging for Redis connection failures.
The __post_init__ method silently swallows connection errors. While the fallback to None is intentional, logging the failure would improve observability when Redis is misconfigured or unavailable.
Consider adding a logger and logging connection failures:
+import logging
+
+logger = logging.getLogger(__name__)
+
def __post_init__(self) -> None:
if redis_asyncio is not None:
try:
self._client = redis_asyncio.from_url(self.url, decode_responses=True)
- except Exception:
+ except Exception as e:
+ logger.warning("Failed to connect to Redis at %s: %s. Using memory fallback.", self.url, e)
self._client = None🤖 Prompt for AI Agents
In backend/services/cache.py around lines 26 to 31, the __post_init__ currently
swallows exceptions when creating the Redis client; add a module-level logger
(import logging and getLogger(__name__)) and update the except block to log the
failure (include the exception details via exc_info=True or logging.exception
and include the Redis URL or a masked form for context) before setting
self._client = None so the fallback behavior remains but failures are
observable.
| if self._client is not None: | ||
| try: | ||
| raw = await self._client.get(full_key) | ||
| except Exception: | ||
| raw = None | ||
| if raw is not None: | ||
| try: | ||
| return json.loads(raw) | ||
| except json.JSONDecodeError: | ||
| return None |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Narrow exception handling and add logging.
The bare except Exception on lines 41 and 46 catches all exceptions, including programming errors. Consider:
- Catching specific Redis exceptions (e.g.,
ConnectionError,TimeoutError) - Logging when falling back to memory cache for observability
If you narrow the exception types and add logging, it will be easier to diagnose issues in production:
if self._client is not None:
try:
raw = await self._client.get(full_key)
- except Exception:
+ except Exception as e:
+ logger.debug("Redis get failed for key %s: %s", full_key, e)
raw = None
if raw is not None:
try:
return json.loads(raw)
except json.JSONDecodeError:
+ logger.warning("Invalid JSON in Redis cache for key %s", full_key)
return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self._client is not None: | |
| try: | |
| raw = await self._client.get(full_key) | |
| except Exception: | |
| raw = None | |
| if raw is not None: | |
| try: | |
| return json.loads(raw) | |
| except json.JSONDecodeError: | |
| return None | |
| if self._client is not None: | |
| try: | |
| raw = await self._client.get(full_key) | |
| except Exception as e: | |
| logger.debug("Redis get failed for key %s: %s", full_key, e) | |
| raw = None | |
| if raw is not None: | |
| try: | |
| return json.loads(raw) | |
| except json.JSONDecodeError: | |
| logger.warning("Invalid JSON in Redis cache for key %s", full_key) | |
| return None |
🤖 Prompt for AI Agents
In backend/services/cache.py around lines 38 to 47, replace the two broad
"except Exception" blocks with targeted exception handling: catch the Redis
client-specific errors (e.g., redis.exceptions.ConnectionError,
redis.exceptions.TimeoutError or your client’s equivalent) when calling
self._client.get(full_key) and log the exception with context before falling
back to the in-memory cache; for the JSON decode branch catch
json.JSONDecodeError only and log the raw payload and decode error before
returning None. Ensure the module has a logger (e.g., using
logging.getLogger(__name__)) and include the exception instance in the log calls
for observability.
| if self._client is not None: | ||
| try: | ||
| await self._client.set(full_key, dumped, ex=ttl) | ||
| except Exception: | ||
| # Fall back to memory cache if Redis write fails. | ||
| pass |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add logging for Redis write failures.
Silent exception swallowing on line 65 makes it difficult to diagnose Redis write failures. Add logging to track when the fallback is being used.
Apply this diff:
if self._client is not None:
try:
await self._client.set(full_key, dumped, ex=ttl)
- except Exception:
+ except Exception as e:
+ logger.debug("Redis set failed for key %s: %s", full_key, e)
# Fall back to memory cache if Redis write fails.
pass📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self._client is not None: | |
| try: | |
| await self._client.set(full_key, dumped, ex=ttl) | |
| except Exception: | |
| # Fall back to memory cache if Redis write fails. | |
| pass | |
| if self._client is not None: | |
| try: | |
| await self._client.set(full_key, dumped, ex=ttl) | |
| except Exception as e: | |
| logger.debug("Redis set failed for key %s: %s", full_key, e) | |
| # Fall back to memory cache if Redis write fails. | |
| pass |
🤖 Prompt for AI Agents
In backend/services/cache.py around lines 62 to 67, the except block currently
swallows all exceptions when Redis set fails; change it to log the failure and
that the code is falling back to the in-memory cache. In the except block, call
the appropriate logger (use self._logger if available, otherwise obtain module
logger via logging.getLogger(__name__)) to record a descriptive message
including the full_key and ttl and the exception details (use logger.exception
or logger.error(..., exc_info=True)), then continue to fall back to memory cache
as before.
| CLOSED: str = "CLOSED" | ||
| OPEN: str = "OPEN" | ||
| HALF_OPEN: str = "HALF_OPEN" |
There was a problem hiding this comment.
Mark breaker state sentinels as ClassVar constants.
Declaring CLOSED/OPEN/HALF_OPEN without ClassVar makes them dataclass fields, so they leak into __init__, equality, and asdict. Anyone instantiating CircuitBreaker(OPEN="foo") (even inadvertently via config) will corrupt the state machine. Mark them as ClassVar (and import it) so they stay true class constants.
-from typing import Any, Callable, Optional, TypeVar
+from typing import Any, Callable, ClassVar, Optional, TypeVar
@@
- CLOSED: str = "CLOSED"
- OPEN: str = "OPEN"
- HALF_OPEN: str = "HALF_OPEN"
+ CLOSED: ClassVar[str] = "CLOSED"
+ OPEN: ClassVar[str] = "OPEN"
+ HALF_OPEN: ClassVar[str] = "HALF_OPEN"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CLOSED: str = "CLOSED" | |
| OPEN: str = "OPEN" | |
| HALF_OPEN: str = "HALF_OPEN" | |
| from typing import Any, Callable, ClassVar, Optional, TypeVar | |
| - CLOSED: str = "CLOSED" | |
| - OPEN: str = "OPEN" | |
| CLOSED: ClassVar[str] = "CLOSED" | |
| OPEN: ClassVar[str] = "OPEN" | |
| HALF_OPEN: ClassVar[str] = "HALF_OPEN" |
🤖 Prompt for AI Agents
In backend/services/circuit_breaker.py around lines 34 to 36, the state sentinel
attributes CLOSED, OPEN, and HALF_OPEN are currently plain annotated strings
which makes them dataclass fields; change their annotations to ClassVar[str] and
add an import for ClassVar from typing so these stay true class-level constants
(e.g., from typing import ClassVar) and are excluded from the
dataclass-generated __init__, equality, and asdict behavior.
| class HygraphClient: | ||
| """HTTPX based client for Hygraph GraphQL requests.""" | ||
|
|
||
| def __init__( | ||
| self, | ||
| endpoint: str, | ||
| token: str, | ||
| *, | ||
| timeout: float = 6.0, | ||
| breaker: Optional[CircuitBreaker] = None, | ||
| client: Optional[httpx.Client] = None, | ||
| ) -> None: | ||
| if not endpoint: | ||
| raise ValueError("Hygraph endpoint must be configured") | ||
| self._endpoint = endpoint | ||
| self._token = token | ||
| self._client = client or httpx.Client(timeout=timeout) | ||
| self._breaker = breaker or CircuitBreaker() | ||
| self._client.headers.setdefault("Content-Type", "application/json") | ||
| self._client.headers.setdefault("Accept", "application/json") | ||
| if token: | ||
| self._client.headers.setdefault("Authorization", f"Bearer {token}") | ||
|
|
||
| @property | ||
| def breaker(self) -> CircuitBreaker: | ||
| return self._breaker | ||
|
|
||
| def execute( | ||
| self, | ||
| query: str, | ||
| variables: Optional[Dict[str, Any]] = None, | ||
| *, | ||
| fallback: Optional[Callable[[], Any]] = None, | ||
| ) -> Dict[str, Any]: | ||
| """Run the GraphQL ``query`` through the circuit breaker.""" | ||
|
|
||
| def _request() -> Dict[str, Any]: | ||
| response = self._client.post( | ||
| self._endpoint, | ||
| json={"query": query, "variables": variables or {}}, | ||
| ) | ||
| response.raise_for_status() | ||
| payload = response.json() | ||
| errors = payload.get("errors") | ||
| if errors: | ||
| raise HygraphGraphQLError(errors) | ||
| data = payload.get("data") | ||
| if not isinstance(data, dict): | ||
| return {} | ||
| return data | ||
|
|
||
| try: | ||
| return self._breaker.call(_request, fallback=fallback) | ||
| except CircuitOpenError: | ||
| # Re-raise so callers can handle fallback semantics. | ||
| raise | ||
|
|
||
| def close(self) -> None: | ||
| self._client.close() |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider supporting async httpx.AsyncClient for better async integration.
While the current synchronous client works with asyncio.to_thread, native async support would eliminate thread overhead and improve performance. Since httpx provides httpx.AsyncClient with an identical API, you could:
- Add an async variant of HygraphClient (e.g.,
AsyncHygraphClient) - Update the service to use the async client directly without
asyncio.to_thread
This is optional and not required for correctness, but would be more idiomatic for async Python.
Example async client usage:
# In hygraph_client.py
class AsyncHygraphClient:
def __init__(self, endpoint: str, token: str, *, timeout: float = 6.0, ...):
self._client = httpx.AsyncClient(timeout=timeout)
# ... rest of init
async def execute(self, query: str, ...) -> Dict[str, Any]:
async def _request() -> Dict[str, Any]:
response = await self._client.post(...)
# ... rest of logic
return self._breaker.call(_request, fallback=fallback)
# In hygraph_service.py
async def _fetch_collection(...):
# Remove asyncio.to_thread wrapper
data = await client.execute(query, ...)| _settings = get_settings() | ||
| _cache = RedisCache(url=getattr(_settings, "redis_url", "redis://localhost:6379/0")) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Simplify settings access.
Since redis_url is a required field in Settings with a default value, using getattr with a fallback is unnecessary and verbose.
Apply this diff:
_settings = get_settings()
-_cache = RedisCache(url=getattr(_settings, "redis_url", "redis://localhost:6379/0"))
+_cache = RedisCache(url=_settings.redis_url)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _settings = get_settings() | |
| _cache = RedisCache(url=getattr(_settings, "redis_url", "redis://localhost:6379/0")) | |
| _settings = get_settings() | |
| _cache = RedisCache(url=_settings.redis_url) |
🤖 Prompt for AI Agents
In backend/services/hygraph_service.py around lines 17 to 18, the code uses
getattr(_settings, "redis_url", "redis://localhost:6379/0") to build the
RedisCache URL, which is unnecessary because redis_url is a required Settings
attribute with a default; replace the getattr usage with a direct property
access _settings.redis_url when constructing RedisCache (i.e., _cache =
RedisCache(url=_settings.redis_url)) to simplify and clarify the code.
| return result | ||
|
|
||
| @classmethod | ||
| async def pull_materials(cls, db, page_size: Optional[int] = None) -> Dict[str, Any]: |
There was a problem hiding this comment.
Add type annotation for the db parameter.
The db parameter is missing a type annotation, causing mypy failures. Based on the usage in routes_sync.py, this should be Session from SQLAlchemy.
Apply this diff to all pull methods (lines 136, 148, 160, 172):
+from sqlalchemy.orm import Session
+
@classmethod
-async def pull_materials(cls, db, page_size: Optional[int] = None) -> Dict[str, Any]:
+async def pull_materials(cls, db: Session, page_size: Optional[int] = None) -> Dict[str, Any]:Repeat for pull_modules, pull_systems, and pull_all methods.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def pull_materials(cls, db, page_size: Optional[int] = None) -> Dict[str, Any]: | |
| from sqlalchemy.orm import Session | |
| @classmethod | |
| async def pull_materials(cls, db: Session, page_size: Optional[int] = None) -> Dict[str, Any]: |
🤖 Prompt for AI Agents
In backend/services/hygraph_service.py around lines 136, 148, 160 and 172, the
async pull_* methods (pull_materials, pull_modules, pull_systems, pull_all) are
missing a type annotation for the db parameter; add the SQLAlchemy type
annotation "db: Session" to each method signature and ensure "from
sqlalchemy.orm import Session" is imported at the top of the file (add the
import if missing). This will satisfy mypy and match usage in routes_sync.py.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
backend/services/circuit_breaker.py (1)
34-36: ClassVar issue already flagged.The previous review correctly identified that these state sentinels must be marked as
ClassVar[str]to prevent them from becoming dataclass fields.
| @dataclass | ||
| class CircuitBreaker: | ||
| """Dataclass based circuit breaker with simple HALF_OPEN support.""" | ||
|
|
||
| failure_threshold: int = 5 | ||
| recovery_timeout: float = 30.0 | ||
| half_open_success_threshold: int = 1 | ||
| time_provider: Callable[[], float] = time.monotonic | ||
| state: str = field(default="CLOSED", init=False) | ||
| failure_count: int = field(default=0, init=False) | ||
| success_count: int = field(default=0, init=False) | ||
| opened_at: Optional[float] = field(default=None, init=False) | ||
|
|
||
| CLOSED: str = "CLOSED" | ||
| OPEN: str = "OPEN" | ||
| HALF_OPEN: str = "HALF_OPEN" | ||
|
|
||
| def _transition_to_open(self) -> None: | ||
| self.state = self.OPEN | ||
| self.opened_at = self.time_provider() | ||
| self.failure_count = 0 | ||
| self.success_count = 0 | ||
|
|
||
| def _transition_to_half_open(self) -> None: | ||
| self.state = self.HALF_OPEN | ||
| self.opened_at = None | ||
| self.failure_count = 0 | ||
| self.success_count = 0 | ||
|
|
||
| def _transition_to_closed(self) -> None: | ||
| self.state = self.CLOSED | ||
| self.opened_at = None | ||
| self.failure_count = 0 | ||
| self.success_count = 0 | ||
|
|
||
| def _ready_for_half_open(self) -> bool: | ||
| if self.opened_at is None: | ||
| return False | ||
| return (self.time_provider() - self.opened_at) >= self.recovery_timeout | ||
|
|
||
| def _record_failure(self) -> None: | ||
| if self.state == self.HALF_OPEN: | ||
| self._transition_to_open() | ||
| return | ||
| self.failure_count += 1 | ||
| if self.failure_count >= self.failure_threshold: | ||
| self._transition_to_open() | ||
|
|
||
| def _record_success(self) -> None: | ||
| if self.state == self.HALF_OPEN: | ||
| self.success_count += 1 | ||
| if self.success_count >= self.half_open_success_threshold: | ||
| self._transition_to_closed() | ||
| return | ||
| elif self.state == self.OPEN: | ||
| # Should not happen, but close proactively. | ||
| self._transition_to_closed() | ||
| else: | ||
| self.failure_count = 0 | ||
|
|
There was a problem hiding this comment.
Critical: Circuit breaker is not thread-safe.
In a concurrent web environment, multiple request threads will access and modify state, failure_count, success_count, and opened_at without synchronization. This causes data races:
- Lost counter increments (two threads both read 4, both write 5, final value is 5 instead of 6)
- Inconsistent state transitions (multiple threads calling
_transition_to_open()concurrently) - Race between reading and writing
opened_at
For production use with concurrent requests, add a threading.Lock to protect all state access and transitions.
Recommended: Add parameter validation.
Consider validating that failure_threshold, half_open_success_threshold are positive integers and recovery_timeout is positive to prevent edge cases.
Apply this diff to add thread safety:
import time
+import threading
from dataclasses import dataclass, field
from typing import Any, Callable, Optional, TypeVar
@dataclass
class CircuitBreaker:
"""Dataclass based circuit breaker with simple HALF_OPEN support."""
failure_threshold: int = 5
recovery_timeout: float = 30.0
half_open_success_threshold: int = 1
time_provider: Callable[[], float] = time.monotonic
state: str = field(default="CLOSED", init=False)
failure_count: int = field(default=0, init=False)
success_count: int = field(default=0, init=False)
opened_at: Optional[float] = field(default=None, init=False)
+ _lock: threading.Lock = field(default_factory=threading.Lock, init=False, repr=False)
CLOSED: str = "CLOSED"
OPEN: str = "OPEN"
HALF_OPEN: str = "HALF_OPEN"
+
+ def __post_init__(self) -> None:
+ if self.failure_threshold <= 0:
+ raise ValueError("failure_threshold must be positive")
+ if self.half_open_success_threshold <= 0:
+ raise ValueError("half_open_success_threshold must be positive")
+ if self.recovery_timeout <= 0:
+ raise ValueError("recovery_timeout must be positive")Then wrap all state-modifying operations in the call() method with the lock.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @dataclass | |
| class CircuitBreaker: | |
| """Dataclass based circuit breaker with simple HALF_OPEN support.""" | |
| failure_threshold: int = 5 | |
| recovery_timeout: float = 30.0 | |
| half_open_success_threshold: int = 1 | |
| time_provider: Callable[[], float] = time.monotonic | |
| state: str = field(default="CLOSED", init=False) | |
| failure_count: int = field(default=0, init=False) | |
| success_count: int = field(default=0, init=False) | |
| opened_at: Optional[float] = field(default=None, init=False) | |
| CLOSED: str = "CLOSED" | |
| OPEN: str = "OPEN" | |
| HALF_OPEN: str = "HALF_OPEN" | |
| def _transition_to_open(self) -> None: | |
| self.state = self.OPEN | |
| self.opened_at = self.time_provider() | |
| self.failure_count = 0 | |
| self.success_count = 0 | |
| def _transition_to_half_open(self) -> None: | |
| self.state = self.HALF_OPEN | |
| self.opened_at = None | |
| self.failure_count = 0 | |
| self.success_count = 0 | |
| def _transition_to_closed(self) -> None: | |
| self.state = self.CLOSED | |
| self.opened_at = None | |
| self.failure_count = 0 | |
| self.success_count = 0 | |
| def _ready_for_half_open(self) -> bool: | |
| if self.opened_at is None: | |
| return False | |
| return (self.time_provider() - self.opened_at) >= self.recovery_timeout | |
| def _record_failure(self) -> None: | |
| if self.state == self.HALF_OPEN: | |
| self._transition_to_open() | |
| return | |
| self.failure_count += 1 | |
| if self.failure_count >= self.failure_threshold: | |
| self._transition_to_open() | |
| def _record_success(self) -> None: | |
| if self.state == self.HALF_OPEN: | |
| self.success_count += 1 | |
| if self.success_count >= self.half_open_success_threshold: | |
| self._transition_to_closed() | |
| return | |
| elif self.state == self.OPEN: | |
| # Should not happen, but close proactively. | |
| self._transition_to_closed() | |
| else: | |
| self.failure_count = 0 | |
| import time | |
| import threading | |
| from dataclasses import dataclass, field | |
| from typing import Any, Callable, Optional, TypeVar | |
| @dataclass | |
| class CircuitBreaker: | |
| """Dataclass based circuit breaker with simple HALF_OPEN support.""" | |
| failure_threshold: int = 5 | |
| recovery_timeout: float = 30.0 | |
| half_open_success_threshold: int = 1 | |
| time_provider: Callable[[], float] = time.monotonic | |
| state: str = field(default="CLOSED", init=False) | |
| failure_count: int = field(default=0, init=False) | |
| success_count: int = field(default=0, init=False) | |
| opened_at: Optional[float] = field(default=None, init=False) | |
| _lock: threading.Lock = field(default_factory=threading.Lock, init=False, repr=False) | |
| CLOSED: str = "CLOSED" | |
| OPEN: str = "OPEN" | |
| HALF_OPEN: str = "HALF_OPEN" | |
| def __post_init__(self) -> None: | |
| if self.failure_threshold <= 0: | |
| raise ValueError("failure_threshold must be positive") | |
| if self.half_open_success_threshold <= 0: | |
| raise ValueError("half_open_success_threshold must be positive") | |
| if self.recovery_timeout <= 0: | |
| raise ValueError("recovery_timeout must be positive") | |
| def _transition_to_open(self) -> None: | |
| - self.state = self.OPEN | |
| - self.opened_at = self.time_provider() | |
| - self.failure_count = 0 | |
| with self._lock: | |
| self.state = self.OPEN | |
| self.opened_at = self.time_provider() | |
| self.failure_count = 0 | |
| self.success_count = 0 | |
| def _transition_to_half_open(self) -> None: | |
| - self.state = self.HALF_OPEN | |
| - self.opened_at = None | |
| - self.failure_count = 0 | |
| with self._lock: | |
| self.state = self.HALF_OPEN | |
| self.opened_at = None | |
| self.failure_count = 0 | |
| self.success_count = 0 | |
| def _transition_to_closed(self) -> None: | |
| - self.state = self.CLOSED | |
| - self.opened_at = None | |
| - self.failure_count = 0 | |
| with self._lock: | |
| self.state = self.CLOSED | |
| self.opened_at = None | |
| self.failure_count = 0 | |
| self.success_count = 0 | |
| def _ready_for_half_open(self) -> bool: | |
| - if self.opened_at is None: | |
| - return False | |
| with self._lock: | |
| if self.opened_at is None: | |
| return False | |
| return (self.time_provider() - self.opened_at) >= self.recovery_timeout | |
| def _record_failure(self) -> None: | |
| - if self.state == self.HALF_OPEN: | |
| - self._transition_to_open() | |
| - return | |
| - self.failure_count += 1 | |
| - if self.failure_count >= self.failure_threshold: | |
| with self._lock: | |
| if self.state == self.HALF_OPEN: | |
| self._transition_to_open() | |
| return | |
| self.failure_count += 1 | |
| if self.failure_count >= self.failure_threshold: | |
| self._transition_to_open() | |
| def _record_success(self) -> None: | |
| - if self.state == self.HALF_OPEN: | |
| - self.success_count += 1 | |
| - if self.success_count >= self.half_open_success_threshold: | |
| - self._transition_to_closed() | |
| - return | |
| - elif self.state == self.OPEN: | |
| - # Should not happen, but close proactively. | |
| - self._transition_to_closed() | |
| - else: | |
| with self._lock: | |
| if self.state == self.HALF_OPEN: | |
| self.success_count += 1 | |
| if self.success_count >= self.half_open_success_threshold: | |
| self._transition_to_closed() | |
| return | |
| elif self.state == self.OPEN: | |
| # Should not happen, but close proactively. | |
| self._transition_to_closed() | |
| else: | |
| self.failure_count = 0 |
| try: | ||
| result = func(*args, **kwargs) | ||
| except Exception as exc: | ||
| was_half_open = self.state == self.HALF_OPEN | ||
| self._record_failure() | ||
| if was_half_open: | ||
| fallback_result: T | None = fallback() if fallback else None | ||
| raise CircuitOpenError("Circuit breaker is open", fallback_result) from exc | ||
| raise |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Previous review comment appears incorrect.
The existing review claims that half-open failures don't invoke the fallback, but lines 102-104 clearly do invoke fallback() and raise CircuitOpenError when was_half_open is True. The half-open probe failure path correctly propagates fallback results.
Minor: Use Optional[T] for Python 3.9 compatibility.
Lines 94 and 103 use T | None syntax, which requires Python 3.10+. For broader compatibility, use Optional[T] instead.
Apply this diff:
- fallback_result: T | None = fallback() if fallback else None
+ fallback_result: Optional[T] = fallback() if fallback else None(Apply the same change on line 103)
Request verification: Document fallback exception behavior.
If the fallback callable itself raises an exception, that exception will propagate to the caller instead of being caught. Verify this is the intended behavior and consider documenting it in the docstring.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| result = func(*args, **kwargs) | |
| except Exception as exc: | |
| was_half_open = self.state == self.HALF_OPEN | |
| self._record_failure() | |
| if was_half_open: | |
| fallback_result: T | None = fallback() if fallback else None | |
| raise CircuitOpenError("Circuit breaker is open", fallback_result) from exc | |
| raise | |
| try: | |
| result = func(*args, **kwargs) | |
| except Exception as exc: | |
| was_half_open = self.state == self.HALF_OPEN | |
| self._record_failure() | |
| if was_half_open: | |
| fallback_result: Optional[T] = fallback() if fallback else None | |
| raise CircuitOpenError("Circuit breaker is open", fallback_result) from exc | |
| raise |
🤖 Prompt for AI Agents
In backend/services/circuit_breaker.py around lines 94 to 105, replace the
Python-3.10-only union type T | None with typing.Optional[T] on lines 94 and 103
(add/import Optional from typing if not present) to ensure Python 3.9
compatibility; also update the docstring for the circuit breaker to state that
if the provided fallback callable raises an exception, that exception will
propagate to the caller (i.e., fallback exceptions are not caught), and confirm
this behavior is intended.
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @shayancoin. * #108 (comment) The following files were modified: * `backend/api/routes_sync.py` * `backend/services/cache.py` * `backend/services/circuit_breaker.py` * `backend/services/hygraph_client.py` * `backend/services/hygraph_service.py`
…nt` (#274) Docstrings generation was requested by @shayancoin. * #108 (comment) The following files were modified: * `backend/api/routes_sync.py` * `backend/services/cache.py` * `backend/services/circuit_breaker.py` * `backend/services/hygraph_client.py` * `backend/services/hygraph_service.py` Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Summary
Testing
pytest tests/test_sync_routes_metrics.py(fails: missing optional dependency sqlalchemy in the environment)https://chatgpt.com/codex/tasks/task_e_68f12f5fdb708330ae76dd38b5397123
Summary by CodeRabbit
Release Notes
New Features
Documentation