Skip to content

feat: add google live backend path#95

Open
chilu18 wants to merge 1 commit intomainfrom
feat/google-live-vertex-path
Open

feat: add google live backend path#95
chilu18 wants to merge 1 commit intomainfrom
feat/google-live-vertex-path

Conversation

@chilu18
Copy link
Collaborator

@chilu18 chilu18 commented Mar 16, 2026

Summary

  • add a Worker bootstrap endpoint for signed Google live sessions
  • add a new FastAPI service for Gemini/Vertex realtime sessions via the Google Gen AI SDK
  • route dashboard Gemini launchpad sessions through the new backend while keeping the existing OpenAI realtime path intact
  • document setup, usage, and deployment for the new Google live path

Validation

  • :
  • :
  • :
  • :
  • :
  • :
  • :
  • : in an isolated venv

Notes

  • OpenAI realtime remains unchanged.
  • I did not run a live Vertex AI end-to-end session because this environment does not have production Google Cloud credentials configured.

@chilu18
Copy link
Collaborator Author

chilu18 commented Mar 16, 2026

OpenCTO Autonomous PR Review (2026-03-16T23:21:00.200Z)

Decision: changes_requested

The PR adds significant new functionality for Google live backend paths including a new FastAPI service, routing changes, and extensive documentation. However, the validation section in the PR description is incomplete, and the author did not perform live end-to-end testing using actual Google Cloud credentials, which raises concerns about the reliability and correctness of the implementation in a production environment.

Concerns:

  1. Validation points in the PR description are incomplete or missing, providing no confidence in test coverage or successful execution.
  2. No live end-to-end testing was performed with production Google Cloud credentials, which is critical for a new backend path involving Google Vertex AI.
  3. Potential risk of breaking existing functionality due to partial integration, although the PR claims OpenAI realtime path remains unchanged, thorough regression testing evidence is lacking.
  4. The sheer size of the additions (+hundreds of lines in multiple files) necessitates clear, thorough testing and validation before approval.
  5. More detailed testing documentation or test results should be provided, especially for the critical Google live session paths.



class LiveSession(Protocol):
async def send_text(self, text: str) -> None: ...

class LiveSession(Protocol):
async def send_text(self, text: str) -> None: ...
async def send_audio(self, data: bytes, mime_type: str) -> None: ...
class LiveSession(Protocol):
async def send_text(self, text: str) -> None: ...
async def send_audio(self, data: bytes, mime_type: str) -> None: ...
async def send_video(self, data: bytes, mime_type: str) -> None: ...
async def send_text(self, text: str) -> None: ...
async def send_audio(self, data: bytes, mime_type: str) -> None: ...
async def send_video(self, data: bytes, mime_type: str) -> None: ...
async def send_tool_responses(self, responses: list[dict[str, Any]]) -> None: ...
async def send_audio(self, data: bytes, mime_type: str) -> None: ...
async def send_video(self, data: bytes, mime_type: str) -> None: ...
async def send_tool_responses(self, responses: list[dict[str, Any]]) -> None: ...
async def receive(self) -> AsyncIterator[Any]: ...


class LiveSessionContext(Protocol):
async def __aenter__(self) -> LiveSession: ...

class LiveSessionContext(Protocol):
async def __aenter__(self) -> LiveSession: ...
async def __aexit__(self, exc_type, exc, tb) -> None: ...


class LiveSessionFactory(Protocol):
def connect(self, model: str, setup_config: dict[str, Any]) -> LiveSessionContext: ...
if callable(close_fn):
result = close_fn()
if asyncio.iscoroutine(result):
await result
@greptile-apps
Copy link

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR adds a full Google Vertex AI Live (Gemini) realtime voice path alongside the existing OpenAI realtime path. It introduces a new FastAPI Python service (opencto-google-live-backend) that proxies authenticated WebSocket sessions to Google Vertex AI, a new Cloudflare Worker endpoint (/api/v1/google-live/session) that mints short-lived HMAC-SHA256 session tokens for the backend, and a new GoogleLiveAdapter in the dashboard that bootstraps those tokens and manages audio capture/playback/tool-call handling.

Key findings:

  • Logic bug (double rate limiting): index.ts enforces google_live_session rate-limiting in the router, then calls googleLive.createGoogleLiveSessionmintGoogleLiveSession which enforces the same rate limit key a second time. Every request consumes two credits, effectively halving the configured limit. The correct pattern (used by the OpenAI realtime path) is to only enforce the limit in one place.
  • Unpinned Python dependencies: requirements.txt specifies all six packages without version constraints, making the service non-reproducible and fragile to upstream breaking changes.
  • Minor redundancy: The worker response includes both wsUrl and websocketUrl set to the same value; the frontend only reads wsUrl, making websocketUrl dead code.

Confidence Score: 3/5

  • Needs fixes before merging — a logic bug causes every session request to be double-counted against the rate limiter.
  • The overall architecture and security model (signed short-lived tokens, HMAC verification, model allow-listing) are sound. However, the double rate-limit enforcement in index.ts is a concrete logic bug that makes the configured session limit operate at half its intended value. Combined with unpinned Python dependencies that could break the service on reinstall, the PR needs at least the rate-limit fix before merge.
  • opencto/opencto-api-worker/src/index.ts (double rate-limit bug) and opencto/opencto-google-live-backend/requirements.txt (unpinned deps)

Important Files Changed

Filename Overview
opencto/opencto-api-worker/src/index.ts Added /api/v1/google-live/session route — contains a double rate-limit bug: the router enforces the limit AND mintGoogleLiveSession enforces it again on the same key, halving the effective limit.
opencto/opencto-api-worker/src/googleLive.ts New module for minting signed HMAC-SHA256 session tokens for the Google Live backend; logic is solid but returns a redundant websocketUrl alias alongside wsUrl.
opencto/opencto-google-live-backend/app.py New FastAPI service that proxies authenticated WebSocket sessions to Google Vertex AI Live. Token verification, model allow-listing, and WebSocket bidirectional bridging are well-implemented.
opencto/opencto-google-live-backend/requirements.txt All six dependencies are unpinned, making builds non-reproducible and vulnerable to unexpected breaking changes on future installs.
opencto/opencto-dashboard/src/lib/realtime/googleAdapter.ts New frontend adapter for the Google Live backend; bootstraps a session token from the worker, opens a WebSocket, and handles audio capture, playback, tool calls, and transcriptions correctly.
opencto/opencto-dashboard/src/lib/realtime/shared.ts Shared realtime utilities extended with Google Live bootstrap logic, model normalization, and audio helpers; changes look correct and backward-compatible.
opencto/opencto-api-worker/src/tests/googleLiveSession.test.ts Comprehensive unit tests covering config errors, happy-path bootstrap, model fall-back; test coverage is solid.
opencto/opencto-google-live-backend/tests/test_api.py Integration tests for the FastAPI service using a fake factory; covers health check, WebSocket setup, text, audio, and tool-response flows adequately.
opencto/opencto-api-worker/src/types.ts Five new optional env vars added for Google Live configuration; all are typed correctly as optional strings.

Sequence Diagram

sequenceDiagram
    participant Browser as Dashboard
    participant Worker as API Worker
    participant Python as Google Live Backend
    participant Google as Google Vertex AI

    Browser->>Worker: POST /api/v1/google-live/session
    Worker->>Worker: Authenticate + rate limit (x2 bug)
    Worker->>Worker: Select allowed model
    Worker->>Worker: Sign short-lived token
    Worker-->>Browser: wsUrl + signed token

    Browser->>Python: WebSocket /ws/live (token in query)
    Python->>Python: Verify token + expiry
    Browser->>Python: setup frame (model, instructions, tools)
    Python->>Google: live.connect(model, config)
    Python-->>Browser: setupComplete

    loop Realtime streaming
        Browser->>Python: audio chunks
        Python->>Google: send_realtime_input
        Google-->>Python: audio + transcriptions
        Python-->>Browser: serverContent frame
        Google-->>Python: toolCall
        Python-->>Browser: toolCall frame
        Browser->>Python: toolResponse
        Python->>Google: send_tool_response
    end
Loading

Last reviewed commit: e6ec5f3

Comment on lines +311 to +319
if (path === '/api/v1/google-live/session' && method === 'POST') {
const body = await request.clone().json().catch(() => ({})) as { workspaceId?: string }
await enforceRateLimit(ctx, 'google_live_session', {
limit: parseRateLimit(ctx.env.RATE_LIMIT_GOOGLE_LIVE_SESSIONS_PER_MINUTE, DEFAULT_REALTIME_RATE_LIMIT_PER_MINUTE),
windowSeconds: 60,
workspaceId: body.workspaceId,
})
return await googleLive.createGoogleLiveSession(request, ctx)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double rate-limit enforcement per request

The route handler enforces google_live_session rate limiting here (lines 312–317), and then calls googleLive.createGoogleLiveSessionmintGoogleLiveSession, which enforces the same rate limit key a second time (see googleLive.ts lines 47–51).

Every request therefore consumes two credits against the same counter, effectively halving whatever limit is configured in RATE_LIMIT_GOOGLE_LIVE_SESSIONS_PER_MINUTE. For comparison, the realtime-token route (lines 300–308) only does rate-limiting in the router and mintRealtimeToken does not call enforceRateLimit internally — that is the correct pattern.

Remove the enforceRateLimit call from the router here, keeping only the one inside mintGoogleLiveSession, so that the limit logic remains self-contained in the feature module.

Suggested change
if (path === '/api/v1/google-live/session' && method === 'POST') {
const body = await request.clone().json().catch(() => ({})) as { workspaceId?: string }
await enforceRateLimit(ctx, 'google_live_session', {
limit: parseRateLimit(ctx.env.RATE_LIMIT_GOOGLE_LIVE_SESSIONS_PER_MINUTE, DEFAULT_REALTIME_RATE_LIMIT_PER_MINUTE),
windowSeconds: 60,
workspaceId: body.workspaceId,
})
return await googleLive.createGoogleLiveSession(request, ctx)
}
if (path === '/api/v1/google-live/session' && method === 'POST') {
return await googleLive.createGoogleLiveSession(request, ctx)
}

Comment on lines +1 to +6
fastapi
uvicorn[standard]
python-dotenv
pytest
httpx
google-genai
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unpinned dependency versions

All six packages are listed without version constraints. This makes builds non-reproducible — a later pip install can silently pull in a breaking release of google-genai, fastapi, or uvicorn.

Pin versions to the exact revisions that were tested, for example:

Suggested change
fastapi
uvicorn[standard]
python-dotenv
pytest
httpx
google-genai
fastapi==0.115.12
uvicorn[standard]==0.34.0
python-dotenv==1.1.0
pytest==8.3.5
httpx==0.28.1
google-genai==1.10.0

Comment on lines +77 to +89
return jsonResponse({
provider: 'google_vertex',
mode: 'vertex_live',
model: selectedModel,
wsUrl,
websocketUrl: wsUrl,
sessionToken,
workspaceId,
sessionId,
traceId: ctx.traceContext.traceId,
expiresAt: new Date(payload.exp * 1000).toISOString(),
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant websocketUrl field in response

wsUrl and websocketUrl are always set to the exact same value (both assigned wsUrl on lines 82–83). The GoogleLiveSessionBootstrap interface in shared.ts only references wsUrl, and googleAdapter.ts only reads bootstrap.wsUrl. The websocketUrl alias is dead code that unnecessarily inflates the response payload.

Suggested change
return jsonResponse({
provider: 'google_vertex',
mode: 'vertex_live',
model: selectedModel,
wsUrl,
websocketUrl: wsUrl,
sessionToken,
workspaceId,
sessionId,
traceId: ctx.traceContext.traceId,
expiresAt: new Date(payload.exp * 1000).toISOString(),
})
}
return jsonResponse({
provider: 'google_vertex',
mode: 'vertex_live',
model: selectedModel,
wsUrl,
sessionToken,
workspaceId,
sessionId,
traceId: ctx.traceContext.traceId,
expiresAt: new Date(payload.exp * 1000).toISOString(),
})

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e6ec5f3700

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +313 to +317
await enforceRateLimit(ctx, 'google_live_session', {
limit: parseRateLimit(ctx.env.RATE_LIMIT_GOOGLE_LIVE_SESSIONS_PER_MINUTE, DEFAULT_REALTIME_RATE_LIMIT_PER_MINUTE),
windowSeconds: 60,
workspaceId: body.workspaceId,
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove duplicate Google Live session rate-limit enforcement

This route already increments the google_live_session bucket before dispatch, but mintGoogleLiveSession in opencto-api-worker/src/googleLive.ts enforces the same bucket again, and enforceRateLimit increments on every check. That means each /api/v1/google-live/session request consumes two quota units, so users hit 429 much earlier than configured (with current defaults, about 6 successful requests/minute instead of 12), which can break normal reconnect/bootstrap flows.

Useful? React with 👍 / 👎.

Comment on lines +184 to +185
voice: this.config.voice,
agentProfile: this.config.agentProfile ?? 'dispatch',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Send Gemini voice setting in the shape the backend reads

The setup payload now sends the selected voice as top-level setup.voice, but the new backend only derives voice from setup.generationConfig.speechConfig via extract_voice_name(generation_config) in opencto-google-live-backend/app.py. In the current flow this means speech_config is never set, so Gemini Live sessions ignore the user’s chosen voice and fall back to the provider default.

Useful? React with 👍 / 👎.

@chilu18
Copy link
Collaborator Author

chilu18 commented Mar 17, 2026

OpenCTO Autonomous PR Review (2026-03-17T00:42:53.726Z)

Decision: changes_requested

The PR adds a new Google live backend path with associated API worker services, documentation, and dashboard routing. While the implementation appears comprehensive and well-documented, the lack of end-to-end live Vertex AI session testing due to missing production Google Cloud credentials is a critical concern that needs to be addressed before approval.

Concerns:

  1. No live end-to-end testing of the Vertex AI integration was performed due to missing production Google Cloud credentials.
  2. Validation section of the PR body lacks concrete test results or evidence of extensive automated testing beyond isolated virtual environment checks.
  3. The potential impact on production environment stability and functionality is uncertain without real environment testing.
  4. It is unclear if appropriate access controls and security measures have been rigorously tested in the new backend path.

@chilu18
Copy link
Collaborator Author

chilu18 commented Mar 17, 2026

OpenCTO Autonomous PR Review (2026-03-17T00:53:03.788Z)

Decision: changes_requested

The PR introduces a new backend path for Google live sessions and related documentation and tests, but it lacks critical validation steps involving live end-to-end tests with Google Cloud credentials. The implementation is extensive, but the absence of real environment testing raises concerns about production readiness and integration robustness.

Concerns:

  1. No live Vertex AI end-to-end session testing was conducted due to missing production Google Cloud credentials.
  2. Validation section is incomplete or placeholders are present, indicating insufficient testing documentation or results.
  3. Potential risk of integration issues since the new Google live path has not been fully verified in a production-like environment.
  4. The PR adds a substantial amount of code and docs without clear evidence of comprehensive validation across all affected components.
  5. It is recommended to perform full end-to-end testing with valid Google Cloud credentials and update the validation documentation accordingly.

@chilu18
Copy link
Collaborator Author

chilu18 commented Mar 17, 2026

OpenCTO Autonomous PR Review (2026-03-17T22:31:55.096Z)

Decision: changes_requested

The PR introduces a significant new backend path for Google live sessions with extensive documentation and code changes. However, there are concerns regarding validation and testing completeness that need to be addressed before approval.

Concerns:

  1. The Validation section in the PR body is largely empty and lacks detail on tests performed, making it unclear how well the feature was tested.
  2. No live end-to-end Vertex AI session was run due to lack of production Google Cloud credentials, leaving the critical integration unvalidated under real conditions.
  3. Given the large amount of new code (e.g., 186 lines in googleLive.ts, many docs and tests), the lack of live validation introduces risk for undetected issues.
  4. It is recommended to add either mocked integration tests simulating Vertex AI behavior or coordinate access to a suitable environment for live testing prior to merging.
  5. Clarification on the impact of these changes on existing workflows, especially how fallback or error handling behaves if the Google live path fails, would strengthen confidence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant