@@ -954,7 +954,11 @@ gh run watch RUN_ID --repo OWNER/REPO
954954{%- endif %}
955955| ` make lint ` | Check code quality |
956956| ` make setup-dev-env ` | Set up dev infrastructure (Terraform) |
957- | ` make deploy ` | Deploy to dev (requires human approval) |
957+ {%- if cookiecutter.deployment_target != 'none' %}
958+ | ` make deploy ` | Deploy to dev |
959+ {%- else %}
960+ | ` uvx agent-starter-pack enhance ` (or equivalent) | Add a deployment target to enable ` make deploy ` |
961+ {%- endif %}
958962
959963## Testing Your Deployed Agent
960964
@@ -964,8 +968,6 @@ After deployment, you can test your agent. The method depends on your deployment
964968
965969The deployment endpoint is stored in ` deployment_metadata.json ` after ` make deploy ` completes.
966970
967- {%- if cookiecutter.deployment_target == "agent_engine" %}
968-
969971### Testing Agent Engine Deployment
970972
971973Your agent is deployed to Vertex AI Agent Engine.
@@ -1008,8 +1010,6 @@ make playground
10081010# Open http://localhost:8000 in your browser
10091011```
10101012
1011- {%- elif cookiecutter.deployment_target == "cloud_run" %}
1012-
10131013### Testing Cloud Run Deployment
10141014
10151015Your agent is deployed to Cloud Run.
@@ -1087,57 +1087,6 @@ gcloud beta iap web add-iam-policy-binding \
10871087```
10881088
10891089** Note:** Use ` iap web add-iam-policy-binding ` for IAP access, not ` run services add-iam-policy-binding ` (which is for ` roles/run.invoker ` ).
1090- {%- if cookiecutter.is_adk and cookiecutter.session_type == "cloud_sql" %}
1091-
1092- ### Testing Cloud SQL Session Persistence
1093-
1094- Your agent uses Cloud SQL (PostgreSQL) for session storage. To verify sessions persist correctly:
1095-
1096- ** 1. Test Session Creation and Resume:**
1097-
1098- ``` bash
1099- # First request - create session and have a conversation
1100- curl -X POST $SERVICE_URL /run \
1101- -H " Authorization: Bearer $( gcloud auth print-identity-token) " \
1102- -H " Content-Type: application/json" \
1103- -d ' {"input": "Qualify lead #123"}' | jq -r ' .session_id'
1104-
1105- # Save the session_id from the response, then test resume:
1106- curl -X POST $SERVICE_URL /run \
1107- -H " Authorization: Bearer $( gcloud auth print-identity-token) " \
1108- -H " Content-Type: application/json" \
1109- -d ' {"input": "What was the lead score?", "session_id": "SESSION_ID_FROM_ABOVE"}'
1110- ```
1111-
1112- The agent should recall details from the first conversation.
1113-
1114- ** 2. Verify Cloud SQL Connection:**
1115-
1116- ``` bash
1117- # Check Cloud Run service logs for successful DB connection
1118- gcloud run services logs read {{cookiecutter.project_name}} \
1119- --project=YOUR_DEV_PROJECT_ID \
1120- --region=YOUR_REGION \
1121- --limit=50 | grep -i " database\|cloud_sql"
1122-
1123- # Verify Cloud SQL instance is running
1124- gcloud sql instances describe {{cookiecutter.project_name}}-db-dev \
1125- --project=YOUR_DEV_PROJECT_ID
1126- ```
1127-
1128- ** 3. Common Cloud SQL Issues:**
1129-
1130- | Issue | Symptom | Resolution |
1131- | -------| ---------| ------------|
1132- | Connection timeout | ` Connection refused ` errors | Check Cloud SQL instance is in same region as Cloud Run |
1133- | IAM auth failed | ` Login failed ` errors | Verify service account has ` roles/cloudsql.client ` |
1134- | Session not found | ` Session does not exist ` | Verify session_id matches and DB tables were created |
1135- | Volume mount failed | ` cloudsql volume not found ` | Check terraform applied Cloud SQL volume configuration |
1136-
1137- {%- endif %}
1138-
1139- {%- endif %}
1140- {%- if cookiecutter.is_a2a %}
11411090
11421091### Testing A2A Protocol Agents
11431092
@@ -1198,7 +1147,6 @@ curl -X POST \
11981147curl -H " Authorization: Bearer $( gcloud auth print-identity-token) " \
11991148 " $SERVICE_URL /a2a/app/.well-known/agent-card.json"
12001149```
1201- {%- endif %}
12021150
12031151### Running Load Tests
12041152
@@ -1238,245 +1186,48 @@ Your agent currently runs as an interactive service. However, many use cases req
12381186
12391187### Adding an /invoke Endpoint
12401188
1241- To enable batch/event processing, add an ` /invoke ` endpoint to your FastAPI app that auto-detects the input format:
1242-
1243- ``` python
1244- # Add to {{cookiecutter.agent_directory}}/fast_api_app.py
1245-
1246- from typing import List, Any, Dict
1247- import asyncio
1248- import base64
1249- import json
1250- from pydantic import BaseModel
1251-
1252- # Request/Response models for different sources
1253- class BQResponse (BaseModel ):
1254- replies: List[str ]
1255-
1256- # Concurrency control (module-level for reuse)
1257- MAX_CONCURRENT = 10
1258- semaphore = asyncio.Semaphore(MAX_CONCURRENT )
1259-
1260-
1261- async def run_agent (prompt : str ) -> str :
1262- """ Run the agent with concurrency control.
1263-
1264- Uses Runner + InMemorySessionService for stateless batch processing.
1265- Each invocation creates a fresh session (no conversation history).
1266- """
1267- async with semaphore:
1268- try :
1269- from {{cookiecutter.agent_directory}}.agent import root_agent
1270- from google.adk.runners import Runner
1271- from google.adk.sessions import InMemorySessionService
1272- from google.genai import types as genai_types
1273-
1274- # Create ephemeral session for this request
1275- session_service = InMemorySessionService()
1276- await session_service.create_session(
1277- app_name = " app" , user_id = " invoke_user" , session_id = " invoke_session"
1278- )
1279- runner = Runner(
1280- agent = root_agent, app_name = " app" , session_service = session_service
1281- )
1282-
1283- # Run agent and collect final response
1284- final_response = " "
1285- async for event in runner.run_async(
1286- user_id = " invoke_user" ,
1287- session_id = " invoke_session" ,
1288- new_message = genai_types.Content(
1289- role = " user" ,
1290- parts = [genai_types.Part.from_text(text = prompt)]
1291- ),
1292- ):
1293- if event.is_final_response() and event.content and event.content.parts:
1294- final_response = event.content.parts[0 ].text
1295- return final_response
1296- except Exception as e:
1297- return json.dumps({" error" : str (e)})
1189+ Add an ` /invoke ` endpoint to ` {{cookiecutter.agent_directory}}/fast_api_app.py ` for batch/event processing. The endpoint auto-detects the input format (BigQuery Remote Function, Pub/Sub, Eventarc, or direct HTTP).
12981190
1191+ ** Core pattern:** Create a ` run_agent ` helper using ` Runner ` + ` InMemorySessionService ` for stateless processing, with a semaphore for concurrency control. Then route by request shape:
12991192
1193+ ``` python
13001194@app.post (" /invoke" )
13011195async def invoke (request : Dict[str , Any]):
1302- """
1303- Universal endpoint that auto-detects input format and routes accordingly.
1304-
1305- Supported formats:
1306- - BigQuery Remote Function: {"calls": [[row1], [row2], ...]}
1307- - Pub/Sub Push: {"message": {"data": "base64...", "attributes": {...}}}
1308- - Eventarc: {"data": {...}, "type": "google.cloud.storage.object.v1.finalized"}
1309- - Direct HTTP: {"input": "your prompt here"}
1310- """
1311-
1312- # === BigQuery Remote Function ===
1313- # Format: {"calls": [[col1, col2], [col1, col2], ...]}
1314- if " calls" in request:
1315- async def process_row (row_data : List[Any]) -> str :
1316- prompt = f " Analyze: { row_data} "
1317- return await run_agent(prompt)
1318-
1319- results = await asyncio.gather(
1320- * [process_row(row) for row in request[" calls" ]]
1321- )
1322- return BQResponse(replies = results)
1323-
1324- # === Pub/Sub Push Subscription ===
1325- # Format: {"message": {"data": "base64...", "attributes": {...}}, "subscription": "..."}
1326- if " message" in request:
1327- message = request[" message" ]
1328- # Decode base64 data
1329- data_b64 = message.get(" data" , " " )
1330- try :
1331- data = base64.b64decode(data_b64).decode(" utf-8" )
1332- payload = json.loads(data)
1333- except Exception :
1334- payload = data_b64 # Use raw if not JSON
1335-
1336- attributes = message.get(" attributes" , {})
1337- prompt = f " Process event: { payload} \n Attributes: { attributes} "
1338-
1339- result = await run_agent(prompt)
1340-
1341- # Pub/Sub expects 2xx response to acknowledge
1342- return {" status" : " success" , " result" : result}
1343-
1344- # === Eventarc (Cloud Events) ===
1345- # Format: {"data": {...}, "type": "google.cloud.storage.object.v1.finalized", ...}
1346- if " type" in request and request.get(" type" , " " ).startswith(" google.cloud." ):
1347- event_type = request[" type" ]
1348- event_data = request.get(" data" , {})
1349-
1350- # Example: Cloud Storage event
1351- if " storage" in event_type:
1352- bucket = event_data.get(" bucket" , " unknown" )
1353- name = event_data.get(" name" , " unknown" )
1354- prompt = f " Process file event: gs:// { bucket} / { name} \n Event type: { event_type} "
1355- else :
1356- prompt = f " Process GCP event: { event_type} \n Data: { event_data} "
1357-
1358- result = await run_agent(prompt)
1359- return {" status" : " success" , " result" : result}
1360-
1361- # === Direct HTTP / Webhook ===
1362- # Format: {"input": "your prompt"} or {"prompt": "your prompt"}
1363- if " input" in request or " prompt" in request:
1364- prompt = request.get(" input" ) or request.get(" prompt" )
1365- result = await run_agent(prompt)
1366- return {" status" : " success" , " result" : result}
1367-
1368- # Unknown format
1369- return {" status" : " error" , " message" : " Unknown request format" , " received_keys" : list (request.keys())}
1370- ```
1371-
1372- ### Local Testing (Before Deployment)
1373-
1374- ** IMPORTANT:** Always test the ` /invoke ` endpoint locally before deploying. Unlike interactive chatbots, batch/event processing is harder to debug in production.
1375-
1376- ``` bash
1377- # Start local backend (default port 8000)
1378- make local-backend
1379-
1380- # Or specify a custom port (useful for parallel development)
1381- make local-backend PORT=8081
1196+ if " calls" in request: # BigQuery: {"calls": [[row1], [row2]]}
1197+ results = await asyncio.gather(* [run_agent(f " Analyze: { row} " ) for row in request[" calls" ]])
1198+ return {" replies" : results}
1199+ if " message" in request: # Pub/Sub: {"message": {"data": "base64..."}}
1200+ payload = base64.b64decode(request[" message" ][" data" ]).decode()
1201+ return {" status" : " success" , " result" : await run_agent(payload)}
1202+ if " type" in request: # Eventarc: {"type": "google.cloud...", "data": {...}}
1203+ return {" status" : " success" , " result" : await run_agent(str (request[" data" ]))}
1204+ if " input" in request: # Direct HTTP: {"input": "prompt"}
1205+ return {" status" : " success" , " result" : await run_agent(request[" input" ])}
13821206```
13831207
1384- ** Test BigQuery batch format:**
1208+ ** Test locally ** with ` make local-backend ` , then curl each format:
13851209``` bash
1386- curl -X POST http://localhost:8000/invoke \
1387- -H " Content-Type: application/json" \
1210+ # BigQuery
1211+ curl -X POST http://localhost:8000/invoke -H " Content-Type: application/json" \
13881212 -d ' {"calls": [["test input 1"], ["test input 2"]]}'
1213+ # Direct
1214+ curl -X POST http://localhost:8000/invoke -H " Content-Type: application/json" \
1215+ -d ' {"input": "your prompt here"}'
13891216```
13901217
1391- ** Test Pub/Sub format (with base64 encoding):**
1392- ``` bash
1393- DATA=$( echo -n ' {"key": "value"}' | base64)
1394- curl -X POST http://localhost:8000/invoke \
1395- -H " Content-Type: application/json" \
1396- -d " {\" message\" : {\" data\" : \" $DATA \" }}"
1397- ```
1398-
1399- ** Test Eventarc format:**
1400- ``` bash
1401- curl -X POST http://localhost:8000/invoke \
1402- -H " Content-Type: application/json" \
1403- -d ' {
1404- "type": "google.cloud.storage.object.v1.finalized",
1405- "data": {"bucket": "my-bucket", "name": "file.pdf"}
1406- }'
1407- ```
1408-
1409- ** What to verify:**
1410- - Correct format detection (check which branch handles your request)
1411- - Expected response format (` {"replies": [...]} ` for BQ, ` {"status": "success"} ` for events)
1412- - Tool calls in logs (for side-effect mode)
1413- - Error handling for malformed inputs
1414-
1415- ### Integration Examples
1416-
1417- ** BigQuery Remote Function:**
1418- ``` sql
1419- -- Create connection (one-time setup)
1420- CREATE EXTERNAL CONNECTION ` project.region.bq_connection`
1421- OPTIONS (cloud_resource_id= " //cloudresourcemanager.googleapis.com/projects/PROJECT_ID" );
1422-
1423- -- Create remote function
1424- CREATE FUNCTION dataset .analyze_customer(data STRING)
1425- RETURNS STRING
1426- REMOTE WITH CONNECTION ` project.region.bq_connection`
1427- OPTIONS (endpoint = ' https://{{cookiecutter.project_name}}.run.app/invoke' );
1428-
1429- -- Process millions of rows
1430- SELECT customer_id, dataset .analyze_customer (customer_data) AS analysis
1431- FROM customers;
1432- ```
1433-
1434- ** Pub/Sub Push Subscription:**
1218+ ** Connect to GCP services:**
14351219``` bash
1436- # Create push subscription pointing to /invoke
1437- gcloud pubsub subscriptions create my-subscription \
1438- --topic=my-topic \
1220+ # Pub/Sub push subscription
1221+ gcloud pubsub subscriptions create my-sub --topic=my-topic \
14391222 --push-endpoint=https://{{cookiecutter.project_name}}.run.app/invoke
1440- ```
1441-
1442- ** Eventarc Trigger:**
1443- ``` bash
1444- # Trigger on Cloud Storage events
1445- gcloud eventarc triggers create storage-trigger \
1223+ # Eventarc trigger
1224+ gcloud eventarc triggers create my-trigger \
14461225 --destination-run-service={{cookiecutter.project_name}} \
14471226 --destination-run-path=/invoke \
1448- --event-filters=" type=google.cloud.storage.object.v1.finalized" \
1449- --event-filters=" bucket=my-bucket"
1227+ --event-filters=" type=google.cloud.storage.object.v1.finalized"
14501228```
14511229
1452- ### Production Considerations
1453-
1454- ** Rate Limiting & Retry:**
1455- - Use semaphores to limit concurrent Gemini calls (avoid 429 errors)
1456- - Implement exponential backoff for transient failures
1457- - For BigQuery: Raise ` TransientError ` on 429s to trigger automatic retries
1458-
1459- ** Error Handling:**
1460- - Return per-row errors as JSON objects, don't fail entire batch
1461- - Log errors with trace IDs for debugging
1462- - Monitor error rates via Cloud Logging/Monitoring
1463-
1464- ** Cost Control:**
1465- - Set Cloud Run ` --max-instances ` to cap concurrent executions
1466- - Monitor Gemini API usage and set budget alerts
1467- - Test with small batches before running on production data
1468-
1469- ### Reference Implementation
1470-
1471- See complete production example with chunking, error handling, and monitoring:
1472- https://github.com/richardhe-fundamenta/practical-gcp-examples/blob/main/bq-remote-function-agent/customer-advisor/app/fast_api_app.py
1473-
1474- ** Key patterns from reference:**
1475- - Async processing with semaphore throttling (` MAX_CONCURRENT_ROWS = 10 ` )
1476- - Chunk batching for memory efficiency (` CHUNK_SIZE = 10 ` )
1477- - Transient vs permanent error classification
1478-
1479- - Structured output extraction from agent responses
1230+ ** Production tips:** Use semaphores to limit concurrent Gemini calls (avoid 429s), set Cloud Run ` --max-instances ` , and return per-row errors instead of failing entire batches. See [ reference implementation] ( https://github.com/richardhe-fundamenta/practical-gcp-examples/blob/main/bq-remote-function-agent/customer-advisor/app/fast_api_app.py ) for production patterns.
14801231
14811232---
14821233
0 commit comments