Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughAdds an end-to-end device‑flow authentication system: CLI client and encrypted local storage, API client, backend device API and devicesvc with DB schemas/repos, device token middleware, GCP/GKE credential plumbing for sandbox containers, console verification UI, and unit tests. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI User
participant CLIApp as Python CLI
participant Backend as Go Backend (deviceapi/devicesvc)
participant DB as PostgreSQL
participant Console as React Console
participant Browser as Browser
CLI->>CLIApp: auth login (initiate)
CLIApp->>Backend: POST /device/auth/initiate
Backend->>DB: INSERT device_code (pending)
Backend-->>CLIApp: {device_code,user_code,verification_url}
CLIApp->>Browser: open verification_url
CLIApp->>CLI: display user_code, start polling
loop Polling
CLIApp->>Backend: POST /device/auth/poll (device_code)
Backend->>DB: GET device_code status
alt pending
Backend-->>CLIApp: {authorized:false, error:"authorization_pending"}
else authorized
Backend->>DB: CREATE device_token, MARK code used
Backend-->>CLIApp: {authorized:true, access_token, refresh_token, expires_in}
end
end
Browser->>Console: user visits /cli/verify, enters user_code
Console->>Backend: POST /device/auth/authorize (user_code, org, user) with auth token
Backend->>DB: UPDATE device_code -> authorized
Backend-->>Console: success
Console-->>Browser: show success
CLIApp->>CLIApp: persist tokens (encrypt), optionally fetch/write GCP creds and start sandbox
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
services/agent/src/client/go/client.go (1)
113-114: Misleading comment: Backoff is linear, not exponential.The comment states "Exponential backoff" but the implementation
time.Duration(attempt) * time.Secondis linear (1s, 2s, 3s...). Exponential backoff would use powers of 2 or similar geometric progression.🔎 Options to fix
Option 1: Correct the comment
- // Exponential backoff for retries + // Linear backoff for retries delay := time.Duration(attempt) * time.SecondOption 2: Implement actual exponential backoff
- // Exponential backoff for retries - delay := time.Duration(attempt) * time.Second + // Exponential backoff for retries + delay := time.Duration(1<<uint(attempt)) * time.Second // 2s, 4s, 8s...cli/src/infragpt/main.py (1)
157-158: Invalid model name:gpt-5doesn't exist.This appears to be a typo or placeholder. As of current OpenAI models, this should likely be
gpt-4oor similar.🔎 Proposed fix
if openai_key and not model_string: - model_string = "openai:gpt-5" + model_string = "openai:gpt-4o" api_key = openai_key provider_name = "openai"
🧹 Nitpick comments (24)
cli/src/infragpt/images/sandbox/Dockerfile (1)
32-38: LGTM! GKE auth plugin addition is appropriate.The addition of
google-cloud-cli-gke-gcloud-auth-plugincorrectly enables GKE cluster authentication, which aligns well with the PR's GCP/GKE credentials handling objectives. The installation follows Dockerfile best practices by combining commands and cleaning up apt lists.Optional suggestion: Consider pinning specific versions of
google-cloud-cliand the auth plugin for reproducible builds (e.g.,google-cloud-cli=<version>), though this is a minor enhancement.services/backend/internal/devicesvc/supporting/postgres/queries/device_code.sql (2)
15-18: Consider using:execrowsto detect state conflicts.The
AuthorizeDeviceCodequery uses:exec, which doesn't return the number of affected rows. If the device code is not in 'pending' status (e.g., already authorized, used, or expired), the UPDATE will silently succeed but affect 0 rows. The caller has no way to distinguish between success and a no-op.Consider changing to
:execrowsto return(int64, error), allowing the service layer to detect when no rows were updated:--- name: AuthorizeDeviceCode :exec +-- name: AuthorizeDeviceCode :execrows UPDATE device_codes SET status = 'authorized', organization_id = $2, user_id = $3 WHERE user_code = $1 AND status = 'pending';Then the repository can check if
rowsAffected == 0and return a domain error likeErrDeviceCodeAlreadyAuthorizedorErrDeviceCodeInvalidState.
20-23: Consider using:execrowsto detect state conflicts.Similar to
AuthorizeDeviceCode, this query should return the affected row count to detect when the device code is not in the expected 'authorized' state.--- name: MarkDeviceCodeAsUsed :exec +-- name: MarkDeviceCodeAsUsed :execrows UPDATE device_codes SET status = 'used' WHERE device_code = $1 AND status = 'authorized';services/backend/internal/devicesvc/supporting/postgres/schema/device_token.sql (1)
3-4: Add length constraints to token columns.Using unbounded
TEXTcolumns for tokens can lead to performance issues with UNIQUE indexes and unnecessarily large storage. Most access/refresh tokens have predictable maximum lengths (e.g., JWT tokens are typically under 2KB).Consider specifying reasonable limits:
🔎 Proposed fix
- access_token TEXT UNIQUE NOT NULL, - refresh_token TEXT UNIQUE NOT NULL, + access_token VARCHAR(2048) UNIQUE NOT NULL, + refresh_token VARCHAR(2048) UNIQUE NOT NULL,Adjust the length based on your actual token generation mechanism (e.g., 512 for random tokens, 2048 for JWTs).
services/backend/internal/devicesvc/supporting/postgres/queries/device_token.sql (2)
5-13: Consider filtering out revoked and expired tokens in queries.The
GetDeviceTokenByAccessTokenandGetDeviceTokenByRefreshTokenqueries return tokens regardless of their revocation or expiration status. While they includerevoked_atandexpires_atin the result, this places the validation burden on every caller in the application layer.For security, consider adding WHERE clauses to automatically exclude invalid tokens:
🔎 Proposed fix
-- name: GetDeviceTokenByAccessToken :one SELECT id, access_token, refresh_token, organization_id, user_id, device_name, expires_at, created_at, revoked_at FROM device_tokens -WHERE access_token = $1; +WHERE access_token = $1 AND revoked_at IS NULL AND expires_at > NOW(); -- name: GetDeviceTokenByRefreshToken :one SELECT id, access_token, refresh_token, organization_id, user_id, device_name, expires_at, created_at, revoked_at FROM device_tokens -WHERE refresh_token = $1; +WHERE refresh_token = $1 AND revoked_at IS NULL AND expires_at > NOW();If you need to fetch revoked/expired tokens for auditing purposes, create separate queries with explicit names like
GetDeviceTokenByAccessTokenIncludingRevoked.
15-23: Consider using:execrowsto detect already-revoked tokens.Similar to the device_code queries, these revoke operations use
:execand don't return the number of affected rows. If the token is already revoked, the UPDATE succeeds silently but affects 0 rows.Consider changing to
:execrowsso the service layer can detect and handle this case appropriately (e.g., return a domain error or log the duplicate revocation attempt).--- name: RevokeDeviceToken :exec +-- name: RevokeDeviceToken :execrows UPDATE device_tokens SET revoked_at = NOW() WHERE access_token = $1 AND revoked_at IS NULL; --- name: RevokeAllDeviceTokensForUser :exec +-- name: RevokeAllDeviceTokensForUser :execrows UPDATE device_tokens SET revoked_at = NOW() WHERE user_id = $1 AND revoked_at IS NULL;services/backend/deviceapi/handler.go (2)
62-86: Consider consistent JSON error responses for all endpoints.The
initiateDeviceFlowhandler useshttp.Error()for errors (line 64, 71) which returns plain text, but other handlers likepollDeviceFlowuse JSON-encoded error responses. For API consistency, consider using JSON error responses throughout.This is a minor consistency issue that could affect API client error handling.
454-474: Unused context variable in validateDeviceToken.The function calls
ValidateTokenwithr.Context()and stores the result, but on line 474, it returnsr.Context()instead of using any context from the validation result. Thectxparameter in the return could be misleading if callers expect an enriched context.If the intent is simply to validate and return the original request context, this is fine, but consider documenting why a fresh context is returned rather than any modified one.
services/backend/internal/devicesvc/service.go (2)
91-150: Consider transaction for token creation and code marking.
PollDeviceFlowcreates a device token (line 134) and then marks the device code as used (line 138). IfMarkAsUsedfails after successful token creation, the token remains in the database but the device code isn't marked as used. A subsequent poll could potentially create duplicate tokens.Consider wrapping these operations in a transaction, or ensuring idempotency by checking if a token already exists for this device code.
🔎 Potential approach
Either:
- Use a database transaction to ensure atomicity
- Add a device_code_id foreign key to device_tokens and check for existing tokens before creation
- Accept eventual consistency if retry logic handles duplicates gracefully
152-166: Clarify authorization status check logic.The logic at lines 158-163 could be clearer. When status is not pending, it checks for expired and then falls through to
ErrDeviceCodeUsed. However, status could also beauthorized(not justusedorexpired).Consider explicitly handling all status cases for clarity:
🔎 Suggested improvement
func (s *Service) AuthorizeDevice(ctx context.Context, userCode string, organizationID, userID uuid.UUID) error { code, err := s.deviceCodeRepo.GetByUserCode(ctx, userCode) if err != nil { return err } - if code.Status != domain.DeviceCodeStatusPending { - if code.Status == domain.DeviceCodeStatusExpired || code.ExpiresAt.Before(time.Now()) { - return domain.ErrDeviceCodeExpired - } - return domain.ErrDeviceCodeUsed + if code.ExpiresAt.Before(time.Now()) { + return domain.ErrDeviceCodeExpired + } + + switch code.Status { + case domain.DeviceCodeStatusPending: + // Continue to authorize + case domain.DeviceCodeStatusExpired: + return domain.ErrDeviceCodeExpired + case domain.DeviceCodeStatusAuthorized, domain.DeviceCodeStatusUsed: + return domain.ErrDeviceCodeUsed + default: + return fmt.Errorf("unexpected device code status: %s", code.Status) } return s.deviceCodeRepo.Authorize(ctx, userCode, organizationID, userID) }cli/CLAUDE.md (1)
7-7: Capitalize "GitHub" per official branding.Per static analysis hint, the official name uses a capital "H".
🔎 Proposed fix
-- use `gh` for all github operations +- use `gh` for all GitHub operationsservices/console/src/services/deviceService.ts (1)
1-3: Consider HTTPS-only default or validation for production.Per coding guidelines, services should enforce HTTPS-only communication with the backend API. The default
http://localhost:8080is acceptable for local development, but consider adding a production safety check.🔎 Proposed enhancement
const API_BASE_URL = import.meta.env.VITE_API_BASE_URL || "http://localhost:8080"; const DEVICE_API_PREFIX = "/device"; + +// Warn in production if not using HTTPS +if (import.meta.env.PROD && !API_BASE_URL.startsWith("https://")) { + console.warn("DeviceService: API_BASE_URL should use HTTPS in production"); +}services/console/src/App.tsx (1)
19-19: Consider lazy loading the CLIVerifyPage component.For bundle optimization, consider lazy loading this page component using React's
lazy()andSuspense, consistent with the route-based code splitting guideline.🔎 Proposed refactor for lazy loading
-import CLIVerifyPage from "./pages/cli/VerifyPage"; +const CLIVerifyPage = lazy(() => import("./pages/cli/VerifyPage"));Also ensure React's
lazyis imported:-import { RedirectToSignIn, useAuth } from "@clerk/clerk-react"; +import { lazy, Suspense } from "react"; +import { RedirectToSignIn, useAuth } from "@clerk/clerk-react";Then wrap the route element in
Suspense(or rely on the parent Suspense boundary if one exists):<Route path="/cli/verify" element={ <ProtectedRoute> - <CLIVerifyPage /> + <Suspense fallback={<div>Loading...</div>}> + <CLIVerifyPage /> + </Suspense> </ProtectedRoute> } />Based on learnings, use route-based and component-based code splitting for bundle optimization.
cli/src/infragpt/encryption.py (1)
22-28: Consider handling JSON serialization errors.If
datacontains non-serializable objects,json.dumpswill raise. Consider adding explicit error handling or documenting the expectation.services/console/src/pages/cli/VerifyPage.tsx (3)
15-15: Unused import:userfromuseUser()is never referenced.The
uservariable is destructured but not used anywhere in the component.🔎 Proposed fix
- const { user } = useUser(); + useUser(); // Hook call kept for Clerk context, but value unusedOr remove the hook call entirely if it's not needed for side effects.
30-34: Redundant organization check with potential inconsistency.The validation checks both
organization?.id(from Clerk) anduserStore.organizationId(from store). If these can diverge, consider which source of truth to use. If they should always match, one check may suffice.
45-50: Code normalization duplicated between submission and display formatting.
handleSubmitappliestoUpperCase().replace(/[^A-Z0-9-]/g, "")whileformatUserCodeuses similar but slightly different logic. Consider extracting a singlenormalizeCodeutility to ensure consistency.🔎 Proposed consolidation
+ const normalizeCode = (value: string): string => { + return value.toUpperCase().replace(/[^A-Z0-9]/g, ""); + }; + const formatUserCode = (value: string) => { - // Remove non-alphanumeric characters and convert to uppercase - const cleaned = value.toUpperCase().replace(/[^A-Z0-9]/g, ""); + const cleaned = normalizeCode(value); // Add hyphen after 4 characters if needed if (cleaned.length > 4) { return cleaned.slice(0, 4) + "-" + cleaned.slice(4, 8); } return cleaned; }; // In handleSubmit: - userCode.toUpperCase().replace(/[^A-Z0-9-]/g, ""), + normalizeCode(userCode.replace(/-/g, "")),Also applies to: 70-78
cli/src/infragpt/container.py (2)
188-194: Chain exception for proper traceback preservation.Per static analysis hint B904, use
raise ... from eto preserve the exception chain.🔎 Proposed fix
try: self.client.images.pull(self.image) except docker.errors.APIError as e: if not self.client.images.list(name=self.image): - raise DockerNotAvailableError( - f"Failed to pull sandbox image: {e}\nRun: docker pull {self.image}" - ) + raise DockerNotAvailableError( + f"Failed to pull sandbox image: {e}\nRun: docker pull {self.image}" + ) from e
361-364: Fragile parsing of cluster list output.Splitting on whitespace assumes a specific output format. Cluster names or locations with unexpected characters could cause misalignment. Consider using
--format="csv[no-heading](name,location)"for more reliable parsing.🔎 Proposed fix
exit_code, stdout, stderr = self._exec_in_container( - 'gcloud container clusters list --format="value(name,location)" --limit=1' + 'gcloud container clusters list --format="csv[no-heading](name,location)" --limit=1' ) if exit_code != 0 or not stdout: raise RuntimeError(f"Failed to list clusters: {stderr}") - parts = stdout.strip().split() - if len(parts) < 2: + parts = stdout.strip().split(",") + if len(parts) != 2: raise RuntimeError(f"Invalid cluster list output: {stdout}") cluster_name, location = parts[0], parts[1]cli/src/infragpt/api_client.py (2)
95-98: Chain exceptions for proper traceback preservation.Per static analysis hint B904, re-raised exceptions should use
fromto preserve the original exception context.🔎 Proposed fix
except httpx.TimeoutException: - raise InfraGPTAPIError(0, "Request timed out") + raise InfraGPTAPIError(0, "Request timed out") from None except httpx.ConnectError: - raise InfraGPTAPIError(0, f"Could not connect to server: {self.server_url}") + raise InfraGPTAPIError(0, f"Could not connect to server: {self.server_url}") from NoneUsing
from Noneexplicitly suppresses the original exception chain if the additional context isn't helpful, or usefrom eto preserve it.
170-182: Token validation uses GCP endpoint as a proxy—consider a dedicated endpoint.
validate_tokencalls the GCP credentials endpoint and interprets 404 as "valid token, no credentials." This works but is semantically confusing and couples token validation to credential endpoints. A dedicated/device/auth/validateendpoint would be cleaner.If a dedicated validation endpoint exists or is planned, consider using it instead.
cli/src/infragpt/main.py (1)
214-223: Consider handling optional GKE cluster gracefully.If
fetch_gke_cluster_info_strict()raisesGKEClusterErrorwith a 404 (no cluster configured), the entire CLI startup fails. For users who only need GCP credentials without GKE, this blocks usage.Consider making GKE cluster optional:
try: gke_cluster = fetch_gke_cluster_info_strict() except GKEClusterError as e: if "No GKE cluster configured" in str(e): gke_cluster = None if verbose: console.print("[dim]No GKE cluster configured.[/dim]") else: raisecli/src/infragpt/auth.py (2)
109-157: Significant code duplication betweenrefresh_token_if_neededandrefresh_token_strict.These two functions share ~90% of their logic. Consider extracting a common helper and having both functions delegate to it with different error handling strategies.
🔎 Proposed refactor
def _do_token_refresh(data: dict) -> dict: """Internal: performs the actual token refresh. Returns updated data dict.""" refresh_token = data.get("refresh_token") if not refresh_token: raise ValueError("No refresh token available") expires_at = data.get("expires_at") if expires_at: try: expires_dt = datetime.fromisoformat(expires_at.replace("Z", "+00:00")) hours_until_expiry = (expires_dt - datetime.now(timezone.utc)).total_seconds() / 3600 if hours_until_expiry >= TOKEN_REFRESH_THRESHOLD_HOURS: return data # No refresh needed except ValueError: pass # Refresh anyway server_url = data.get("server_url") client = InfraGPTClient(server_url=server_url) result = client.refresh_token(refresh_token) expires_at_new = datetime.now(timezone.utc) + timedelta(seconds=result.expires_in) data["access_token"] = result.access_token data["refresh_token"] = result.refresh_token data["expires_at"] = expires_at_new.isoformat() _save_auth_data(data) return data def refresh_token_if_needed() -> bool: data = _load_auth_data() if not data: return False try: _do_token_refresh(data) return True except (ValueError, InfraGPTAPIError, httpx.RequestError): return False def refresh_token_strict() -> None: data = _load_auth_data() if not data: raise TokenRefreshError("No auth data found") try: _do_token_refresh(data) except ValueError as e: raise TokenRefreshError(str(e)) from e except InfraGPTAPIError as e: raise TokenRefreshError(f"Failed to refresh token: {e.message}") from e except httpx.RequestError as e: raise TokenRefreshError(f"Failed to connect to server: {e}") from eAlso applies to: 326-374
143-148: Awkward timezone manipulation when calculating expiration.The code strips timezone, adds timedelta, then re-adds timezone. This is unnecessary—
datetimewith timezone supports arithmetic directly.🔎 Proposed simplification
- expires_at_new = datetime.now(timezone.utc).replace(microsecond=0) - expires_at_new = expires_at_new.replace(tzinfo=None) - expires_at_new = expires_at_new + timedelta(seconds=result.expires_in) - expires_at_new = expires_at_new.replace(tzinfo=timezone.utc) + from datetime import timedelta + expires_at_new = datetime.now(timezone.utc).replace(microsecond=0) + timedelta(seconds=result.expires_in)Also, move the
from datetime import timedeltaimport to the top of the file.Also applies to: 359-364
| def get_encryption_key() -> bytes: | ||
| """Derive encryption key from machine-specific information.""" | ||
| hostname = platform.node() | ||
| username = getpass.getuser() | ||
| salt = f"{hostname}:{username}:infragpt" | ||
|
|
||
| key_material = hashlib.sha256(salt.encode()).digest() | ||
| # Fernet requires a 32-byte key, base64-encoded | ||
| return urlsafe_b64encode(key_material) |
There was a problem hiding this comment.
Security concern: Encryption key derived from predictable inputs.
The encryption key is derived solely from hostname, username, and a static salt—all publicly knowable values. An attacker with read access to the auth file (e.g., via backup, malware, or shared system) can trivially derive the key and decrypt credentials.
Consider adding a randomly-generated component stored in a separate file (or system keyring if available) to provide actual secrecy.
🔎 Example improvement using a random secret component
+import secrets
+
+SECRET_FILE = Path.home() / ".config" / "infragpt" / ".secret"
+
def get_encryption_key() -> bytes:
"""Derive encryption key from machine-specific information."""
+ # Load or create a random secret component
+ if SECRET_FILE.exists():
+ secret = SECRET_FILE.read_bytes()
+ else:
+ secret = secrets.token_bytes(32)
+ SECRET_FILE.parent.mkdir(parents=True, exist_ok=True)
+ SECRET_FILE.write_bytes(secret)
+ SECRET_FILE.chmod(0o600)
+
hostname = platform.node()
username = getpass.getuser()
- salt = f"{hostname}:{username}:infragpt"
-
- key_material = hashlib.sha256(salt.encode()).digest()
+ salt = f"{hostname}:{username}:infragpt".encode() + secret
+ key_material = hashlib.sha256(salt).digest()
# Fernet requires a 32-byte key, base64-encoded
return urlsafe_b64encode(key_material)Committable suggestion skipped: line range outside the PR's diff.
| if code.ExpiresAt.Before(time.Now()) && code.Status == domain.DeviceCodeStatusPending { | ||
| code.Status = domain.DeviceCodeStatusExpired | ||
| } |
There was a problem hiding this comment.
Inconsistent handling of expired device codes.
The code modifies the status to Expired at the application layer without persisting this change to the database. This creates inconsistency:
- Database record still has
status='pending' - Current request sees
status='expired' - Next request reading the same code might see
status='pending'again
Consider one of these approaches:
- Remove the status modification and let
DeleteExpiredDeviceCodeshandle cleanup. The service layer can check expiration separately when needed. - Persist the status change by adding an
UpdateStatusmethod and calling it when detecting expiration. - Document that
Expiredis a transient read-only status for API responses only, never persisted.
🤖 Prompt for AI Agents
In
services/backend/internal/devicesvc/supporting/postgres/device_code_repository.go
around lines 91-93, the code sets code.Status = domain.DeviceCodeStatusExpired
in-memory when ExpiresAt is past but never persists that change, causing
DB/application state divergence; either remove the in-memory status mutation and
rely on DeleteExpiredDeviceCodes (or treat Expired as transient), or add/used a
repository UpdateStatus method to persist the new status when expiration is
detected; implement the chosen approach consistently (remove mutation OR call
repository.UpdateStatus(code.ID, Expired) and handle DB errors) and document the
behavior.
| type DeviceCode struct { | ||
| ID uuid.UUID `json:"id"` | ||
| DeviceCode string `json:"device_code"` | ||
| UserCode string `json:"user_code"` | ||
| Status string `json:"status"` | ||
| OrganizationID uuid.NullUUID `json:"organization_id"` | ||
| UserID uuid.NullUUID `json:"user_id"` | ||
| ExpiresAt time.Time `json:"expires_at"` | ||
| CreatedAt time.Time `json:"created_at"` | ||
| } | ||
|
|
||
| type DeviceToken struct { | ||
| ID uuid.UUID `json:"id"` | ||
| AccessToken string `json:"access_token"` | ||
| RefreshToken string `json:"refresh_token"` | ||
| OrganizationID uuid.UUID `json:"organization_id"` | ||
| UserID uuid.UUID `json:"user_id"` | ||
| DeviceName sql.NullString `json:"device_name"` | ||
| ExpiresAt time.Time `json:"expires_at"` | ||
| CreatedAt time.Time `json:"created_at"` | ||
| RevokedAt sql.NullTime `json:"revoked_at"` | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove JSON tags from sqlc-generated infrastructure models.
These database models have JSON tags, but according to the coding guidelines, JSON tags should only be used on API boundary structs (request/response types). These infrastructure models are mapped to clean domain types and should not have serialization concerns.
Since this is sqlc-generated code, update your sqlc.json configuration to disable JSON tag emission for this package:
{
"emit_json_tags": false
}As per coding guidelines: "Do not add json tags to every struct, only when serializing/deserializing to/from json (e.g., API handlers, external JSON API processing)."
🤖 Prompt for AI Agents
In services/backend/internal/devicesvc/supporting/postgres/models.go around
lines 14 to 35, the sqlc-generated DB model structs include JSON tags which
violate the guideline to only use json tags on API boundary types; remove the
json struct tags from these models and regenerate the file by updating the sqlc
configuration for this package to disable JSON tag emission (set
"emit_json_tags": false in sqlc.json) and re-run sqlc so models are produced
without json tags.
| id UUID PRIMARY KEY DEFAULT gen_random_uuid(), | ||
| device_code TEXT UNIQUE NOT NULL, | ||
| user_code VARCHAR(10) UNIQUE NOT NULL, | ||
| status VARCHAR(20) NOT NULL DEFAULT 'pending', |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add CHECK constraint for valid status values.
The status column accepts any VARCHAR(20) value but the application expects specific values ('pending', 'authorized', 'used', 'expired' based on the domain model). Add a CHECK constraint to enforce valid values at the database level:
🔎 Proposed fix
- status VARCHAR(20) NOT NULL DEFAULT 'pending',
+ status VARCHAR(20) NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'authorized', 'used', 'expired')),This prevents invalid status values from being inserted through any means (application bugs, manual SQL, migrations, etc.).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| status VARCHAR(20) NOT NULL DEFAULT 'pending', | |
| status VARCHAR(20) NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'authorized', 'used', 'expired')), |
🤖 Prompt for AI Agents
In
services/backend/internal/devicesvc/supporting/postgres/schema/device_code.sql
around line 5, the status column allows any VARCHAR(20) but must be restricted
to the domain values; add a CHECK constraint limiting status to
('pending','authorized','used','expired') (e.g., CONSTRAINT
device_code_status_check CHECK (status IN
('pending','authorized','used','expired'))), keep the DEFAULT 'pending', and
either include this CHECK inline in the CREATE TABLE definition or add it via an
ALTER TABLE statement to enforce valid statuses at the DB level.
| CREATE INDEX idx_device_tokens_access_token ON device_tokens (access_token); | ||
| CREATE INDEX idx_device_tokens_refresh_token ON device_tokens (refresh_token); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove redundant indexes.
The access_token and refresh_token columns already have UNIQUE constraints (lines 3-4), which automatically create indexes. The explicit CREATE INDEX statements here are redundant and waste storage.
🔎 Proposed fix
CREATE TABLE device_tokens (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
access_token TEXT UNIQUE NOT NULL,
refresh_token TEXT UNIQUE NOT NULL,
organization_id UUID NOT NULL,
user_id UUID NOT NULL,
device_name VARCHAR(255),
expires_at TIMESTAMP NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
revoked_at TIMESTAMP
);
-CREATE INDEX idx_device_tokens_access_token ON device_tokens (access_token);
-CREATE INDEX idx_device_tokens_refresh_token ON device_tokens (refresh_token);
CREATE INDEX idx_device_tokens_user_id ON device_tokens (user_id);
CREATE INDEX idx_device_tokens_expires_at ON device_tokens (expires_at) WHERE revoked_at IS NULL;🤖 Prompt for AI Agents
In
services/backend/internal/devicesvc/supporting/postgres/schema/device_token.sql
around lines 13 to 14, the explicit CREATE INDEX statements for access_token and
refresh_token are redundant because those columns already have UNIQUE
constraints which create indexes; remove the two CREATE INDEX lines
(idx_device_tokens_access_token and idx_device_tokens_refresh_token) from the
schema file so there is a single index created via the UNIQUE constraints and
avoid duplicate indexes and wasted storage.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
cli/src/infragpt/api_client.py (1)
170-182: Token validation uses GCP credentials endpoint as proxy.The
validate_tokenmethod validates tokens by calling the GCP credentials endpoint and treating 404 responses as valid tokens. While functional, this couples authentication validation to a specific resource endpoint. Consider adding a dedicated/device/auth/validateendpoint in the future for clearer separation of concerns.cli/CLAUDE.md (1)
18-18: Consider more concise wording.The phrase "a lot of comments" could be simplified to "many comments" or "excessive comments" for better readability.
🔎 Proposed alternatives
-- Do not write a lot of comments or docstrings, use any comment or docstrings when code is not self-explanatory +- Do not write many comments or docstrings, use any comment or docstrings when code is not self-explanatoryor
-- Do not write a lot of comments or docstrings, use any comment or docstrings when code is not self-explanatory +- Do not write excessive comments or docstrings, use any comment or docstrings when code is not self-explanatorycli/src/infragpt/auth.py (1)
157-162: Simplify timezone handling for expiry calculation.The timezone manipulation removes and re-adds timezone info unnecessarily. This can be simplified to a single operation.
🔎 Proposed simplification
from datetime import timedelta - expires_at_new = datetime.now(timezone.utc).replace(microsecond=0) - expires_at_new = expires_at_new.replace(tzinfo=None) - expires_at_new = expires_at_new + timedelta(seconds=result.expires_in) - expires_at_new = expires_at_new.replace(tzinfo=timezone.utc) + expires_at_new = datetime.now(timezone.utc) + timedelta(seconds=result.expires_in)Apply the same simplification to lines 374-377 in
refresh_token_strict().
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
cli/CLAUDE.mdcli/pyproject.tomlcli/src/infragpt/api_client.pycli/src/infragpt/auth.pycli/src/infragpt/main.pycli/tests/test_api_client.pycli/tests/test_auth_strict.py
🚧 Files skipped from review as they are similar to previous changes (1)
- cli/tests/test_api_client.py
🧰 Additional context used
📓 Path-based instructions (1)
cli/**/*.py
📄 CodeRabbit inference engine (cli/CLAUDE.md)
Use
uvfor running Python scripts
Files:
cli/src/infragpt/auth.pycli/src/infragpt/main.pycli/src/infragpt/api_client.pycli/tests/test_auth_strict.py
🧠 Learnings (2)
📚 Learning: 2025-12-15T10:28:10.152Z
Learnt from: CR
Repo: 73ai/infragpt PR: 0
File: cli/CLAUDE.md:0-0
Timestamp: 2025-12-15T10:28:10.152Z
Learning: Applies to cli/**/*.py : Use `uv` for running Python scripts
Applied to files:
cli/CLAUDE.md
📚 Learning: 2025-12-15T10:28:40.036Z
Learnt from: CR
Repo: 73ai/infragpt PR: 0
File: services/agent/CLAUDE.md:0-0
Timestamp: 2025-12-15T10:28:40.036Z
Learning: Applies to services/agent/tests/**/*.py : Structure unit tests for individual agents and components; implement integration tests for gRPC communication and mock LLM interactions
Applied to files:
cli/tests/test_auth_strict.py
🧬 Code graph analysis (2)
cli/src/infragpt/auth.py (3)
cli/src/infragpt/encryption.py (4)
encrypt_data(22-28)decrypt_data(31-37)secure_file_write(40-44)secure_file_read(47-51)cli/src/infragpt/api_client.py (11)
InfraGPTClient(54-182)InfraGPTAPIError(47-51)GCPCredentials(34-36)GKEClusterInfo(40-44)refresh_token(126-136)initiate_device_flow(100-108)poll_device_flow(110-124)revoke_token(138-144)get_gcp_credentials(146-155)get_gke_cluster_info(157-168)validate_token(170-182)cli/src/infragpt/exceptions.py (4)
AuthValidationError(5-6)TokenRefreshError(9-10)GCPCredentialError(13-14)GKEClusterError(17-18)
cli/src/infragpt/main.py (6)
cli/src/infragpt/container.py (6)
is_sandbox_mode(68-74)get_executor(122-133)cleanup_executor(136-141)cleanup_old_containers(91-119)DockerNotAvailableError(40-43)start(182-227)cli/src/infragpt/tools.py (2)
get_executor(29-38)cleanup_executor(41-51)cli/src/infragpt/auth.py (10)
login(174-258)logout(261-278)get_auth_status(98-112)is_authenticated(80-95)validate_token_with_api(322-336)refresh_token_strict(339-387)fetch_gcp_credentials_strict(390-406)fetch_gke_cluster_info_strict(409-425)write_gcp_credentials_file(307-313)cleanup_credentials(316-319)cli/src/infragpt/exceptions.py (5)
AuthValidationError(5-6)TokenRefreshError(9-10)GCPCredentialError(13-14)GKEClusterError(17-18)ContainerSetupError(21-22)services/agent/src/agents/base.py (1)
name(33-35)cli/src/infragpt/agent.py (1)
run_shell_agent(267-270)
🪛 LanguageTool
cli/CLAUDE.md
[style] ~18-~18: Consider using a synonym to be more concise.
Context: ...run` for running scripts - Do not write a lot of comments or docstrings, use any comment...
(A_LOT_OF)
🪛 Ruff (0.14.10)
cli/src/infragpt/auth.py
169-169: Consider moving this statement to an else block
(TRY300)
187-187: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
220-220: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
222-222: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
326-326: Avoid specifying long messages outside the exception class
(TRY003)
333-333: Avoid specifying long messages outside the exception class
(TRY003)
334-334: Avoid specifying long messages outside the exception class
(TRY003)
336-336: Avoid specifying long messages outside the exception class
(TRY003)
343-343: Avoid specifying long messages outside the exception class
(TRY003)
349-349: Avoid specifying long messages outside the exception class
(TRY003)
385-385: Avoid specifying long messages outside the exception class
(TRY003)
387-387: Avoid specifying long messages outside the exception class
(TRY003)
394-394: Avoid specifying long messages outside the exception class
(TRY003)
401-403: Avoid specifying long messages outside the exception class
(TRY003)
404-404: Avoid specifying long messages outside the exception class
(TRY003)
406-406: Avoid specifying long messages outside the exception class
(TRY003)
413-413: Avoid specifying long messages outside the exception class
(TRY003)
420-422: Avoid specifying long messages outside the exception class
(TRY003)
423-423: Avoid specifying long messages outside the exception class
(TRY003)
425-425: Avoid specifying long messages outside the exception class
(TRY003)
cli/src/infragpt/api_client.py
88-88: Do not catch blind exception: Exception
(BLE001)
94-94: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
96-98: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
178-178: Consider moving this statement to an else block
(TRY300)
cli/tests/test_auth_strict.py
34-34: Possible hardcoded password assigned to argument: "access_token"
(S106)
50-50: Possible hardcoded password assigned to argument: "access_token"
(S106)
68-68: Possible hardcoded password assigned to argument: "access_token"
(S106)
116-116: Possible hardcoded password assigned to argument: "access_token"
(S106)
133-133: Possible hardcoded password assigned to argument: "access_token"
(S106)
156-156: Possible hardcoded password assigned to argument: "access_token"
(S106)
173-173: Possible hardcoded password assigned to argument: "access_token"
(S106)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (5)
cli/pyproject.toml (1)
31-32: Security vulnerability addressed.The upgrade to
cryptography>=44.0.1resolves CVE-2024-12797 that affected version 44.0.0, andhttpx>=0.28.1includes the SSL/verify fix from the 0.28.1 patch release. Well done!cli/tests/test_auth_strict.py (1)
1-184: Comprehensive test coverage for authentication flows.The test suite provides excellent coverage of authentication edge cases, error conditions, and success paths. Tests properly use mocking to isolate units under test and verify exception handling.
cli/src/infragpt/main.py (3)
91-141: Well-structured auth command group.The authentication command group provides a clean CLI interface for login, logout, and status operations. The status command provides helpful user guidance and displays relevant authentication details.
223-254: Robust authentication and credential plumbing for sandbox mode.The strict validation flow ensures authenticated users have valid tokens and necessary GCP/GKE credentials before starting the sandbox container. The separation of concerns between authentication validation, token refresh, and credential retrieval is well-designed.
263-275: Comprehensive error handling for authentication and container failures.The error handling properly catches and surfaces authentication, credential, and container setup failures with actionable guidance for users.
| except httpx.TimeoutException: | ||
| raise InfraGPTAPIError(0, "Request timed out") | ||
| except httpx.ConnectError: | ||
| raise InfraGPTAPIError( | ||
| 0, f"Could not connect to server: {self.api_base_url}" | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add exception chaining for better error context.
The timeout and connection error handlers should use raise ... from None to indicate these are deliberate exception transformations, not errors during exception handling.
🔎 Proposed fix
except httpx.TimeoutException:
- raise InfraGPTAPIError(0, "Request timed out")
+ raise InfraGPTAPIError(0, "Request timed out") from None
except httpx.ConnectError:
raise InfraGPTAPIError(
0, f"Could not connect to server: {self.api_base_url}"
- )
+ ) from None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except httpx.TimeoutException: | |
| raise InfraGPTAPIError(0, "Request timed out") | |
| except httpx.ConnectError: | |
| raise InfraGPTAPIError( | |
| 0, f"Could not connect to server: {self.api_base_url}" | |
| ) | |
| except httpx.TimeoutException: | |
| raise InfraGPTAPIError(0, "Request timed out") from None | |
| except httpx.ConnectError: | |
| raise InfraGPTAPIError( | |
| 0, f"Could not connect to server: {self.api_base_url}" | |
| ) from None |
🧰 Tools
🪛 Ruff (0.14.10)
94-94: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
96-98: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In cli/src/infragpt/api_client.py around lines 93 to 98, the except handlers for
httpx.TimeoutException and httpx.ConnectError must use exception chaining
suppression; change the raises to use "raise InfraGPTAPIError(... ) from None"
so these are explicit transformations and not chained to the original exception,
keeping the same messages and status codes.
Summary by CodeRabbit
New Features
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.