Skip to content

feat: add Vertex AI Search and Vector Search data connectors for agentic_rag#791

Merged
eliasecchig merged 2 commits intomainfrom
feat/vas-data-connector
Feb 16, 2026
Merged

feat: add Vertex AI Search and Vector Search data connectors for agentic_rag#791
eliasecchig merged 2 commits intomainfrom
feat/vas-data-connector

Conversation

@eliasecchig
Copy link
Collaborator

Summary

  • Migrate data ingestion from shared agent_starter_pack/data_ingestion/ to agent-specific agents/agentic_rag/data_ingestion/
  • Add Vector Search 2.0 Collections API support alongside Vertex AI Search
  • Replace shell scripts with Python scripts for data connector setup and management
  • Add setup-datastore CLI command and sample data for both datastore types
  • Use dedicated asp-rag GCP projects for agentic_rag e2e tests
  • Conditionally enable vectorsearch.googleapis.com API only when datastore type requires it
  • Add --wait flag to connector run for blocking sync support

Changes

  • CLI: Replace --include-data-ingestion flag with auto-derivation from --datastore option; add setup-datastore command
  • Agent: Refactor agentic_rag agent and retrievers to support both Vertex AI Search and Vector Search datastores
  • Terraform: Move datastore-specific infra from shared templates to agents/agentic_rag/deployment/terraform/; add Vector Search resources
  • Scripts: Replace bash scripts with Python scripts for data connector lifecycle management
  • CI/CD: Update build triggers and workflows for new paths and dedicated RAG projects
  • Docs: Update CLI docs and data ingestion guide

…or for vertex_ai_search

- Switch agentic_rag agent from LangChain retriever to native ADK VertexAiSearchTool
  for vertex_ai_search datastore type
- Replace Vertex AI Pipeline-based ingestion with GCS Data Connector approach using
  Discovery Engine setUpDataConnectorV2 API
- Add Terraform null_resource + external data source pattern for managing data
  connectors and retrieving auto-created data store IDs
- Add shell scripts for connector setup, data store ID retrieval, and on-demand sync
- Add Makefile sync-data target for triggering manual data syncs
- Skip data_ingestion pipeline file copy for vertex_ai_search projects
- Update CI/CD pipelines to only include pipeline steps for vertex_ai_vector_search
- Update deployment target service.tf files to use external data source references
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @eliasecchig, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agentic_rag agent by integrating robust data ingestion capabilities for both Vertex AI Search and the new Vertex AI Vector Search 2.0. The changes streamline the setup and management of data connectors, moving from shared, generic scripts to agent-specific Python-based solutions. This refactoring improves modularity, simplifies the developer experience, and ensures that the RAG agents can leverage advanced search functionalities with greater ease and efficiency.

Highlights

  • Data Ingestion Migration: Migrated data ingestion for the agentic_rag agent from a shared directory to an agent-specific agents/agentic_rag/data_ingestion/ path, enhancing modularity.
  • Vertex AI Vector Search 2.0 Support: Added comprehensive support for Vertex AI Vector Search 2.0 Collections API, alongside existing Vertex AI Search capabilities, for improved RAG functionalities.
  • Python-based Data Connector Management: Replaced previous shell scripts with Python scripts for setting up and managing data connectors, streamlining the process and improving maintainability.
  • Simplified Datastore Setup: Introduced a new setup-datastore CLI command and included sample data to simplify the provisioning and initial data loading for both Vertex AI Search and Vector Search 2.0 datastores.
  • Dedicated E2E Test Projects: Configured dedicated GCP projects (asp-rag) for agentic_rag end-to-end tests, ensuring isolated and consistent testing environments.
  • Conditional API Enablement: Implemented conditional enablement of the vectorsearch.googleapis.com API, activating it only when the selected datastore type requires it.
  • CLI Flag Simplification: Removed the --include-data-ingestion CLI flag; data ingestion is now automatically enabled and configured based on the presence and value of the --datastore option.
Changelog
  • .cloudbuild/terraform/apis.tf
    • Enabled Cloud Resource Manager API for RAG E2E projects.
  • .cloudbuild/terraform/build_triggers.tf
    • Updated data ingestion paths and removed the --include-data-ingestion flag from build triggers.
    • Adjusted project mapping for agentic_rag E2E tests.
  • .cloudbuild/terraform/service_account.tf
    • Granted IAM roles to the CI/CD runner service account for RAG E2E project environments.
  • .cloudbuild/terraform/variables.tf
    • Added a variable for agentic_rag E2E project mapping.
    • Updated cleanup project IDs to include RAG-specific projects.
  • .cloudbuild/terraform/vars/env.tfvars
    • Defined project mappings for agentic_rag E2E environments.
  • GEMINI.md
    • Removed the include_data_ingestion parameter from a configuration example.
  • agent_starter_pack/agents/agentic_rag/.template/templateconfig.yaml
    • Added google-cloud-vectorsearch to extra dependencies.
  • agent_starter_pack/agents/agentic_rag/app/agent.py
    • Refactored agent to support Vertex AI Search Tool or Vector Search 2.0 collection search based on datastore type.
    • Removed VertexAIEmbeddings and related embedding generation logic.
    • Removed import of templates.py.
  • agent_starter_pack/agents/agentic_rag/app/retrievers.py
    • Replaced old retriever and compressor implementations with a new search_collection function for Vector Search 2.0.
  • agent_starter_pack/agents/agentic_rag/app/templates.py
    • Removed the file.
  • agent_starter_pack/agents/agentic_rag/data_ingestion/README.md
    • Added a new README detailing the data ingestion pipeline for Vertex AI Vector Search 2.0.
  • agent_starter_pack/agents/agentic_rag/data_ingestion/data_ingestion_pipeline/components/ingest_data.py
    • Added a component to ingest processed data into a Vector Search 2.0 Collection.
  • agent_starter_pack/agents/agentic_rag/data_ingestion/data_ingestion_pipeline/components/process_data.py
    • Added a component to process StackOverflow data for Vector Search 2.0 ingestion.
  • agent_starter_pack/agents/agentic_rag/data_ingestion/data_ingestion_pipeline/pipeline.py
    • Updated the data ingestion pipeline to use new components and parameters for Vector Search 2.0.
    • Removed Vertex AI Search specific ingestion logic.
  • agent_starter_pack/agents/agentic_rag/data_ingestion/data_ingestion_pipeline/submit_pipeline.py
    • Modified the pipeline submission script to support local execution and updated argument parsing for Vector Search 2.0.
    • Removed aiplatform import.
  • agent_starter_pack/agents/agentic_rag/data_ingestion/pyproject.toml
    • Updated dependencies for data ingestion components, including bigframes, docker, google-cloud-bigquery, google-cloud-vectorsearch, langchain-text-splitters, markdownify, and swifter.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/dev/vector_search.tf
    • Added Terraform resources for Vector Search 2.0 collection and GCS bucket in the dev environment.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/dev/vector_search_iam.tf
    • Added IAM roles for the Vertex AI Pipeline service account in the dev environment.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/dev/vector_search_variables.tf
    • Defined variables for Vector Search 2.0 configuration in the dev environment.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/dev/vertex_ai_search.tf
    • Added Terraform resources for Vertex AI Search data connector and search engine in the dev environment.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/dev/vertex_ai_search_variables.tf
    • Defined variables for Vertex AI Search configuration in the dev environment.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/scripts/delete_data_connector.py
    • Added a Python script to delete a GCS Data Connector.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/scripts/delete_vector_search_collection.py
    • Added a Python script to delete a Vector Search 2.0 Collection.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/scripts/get_data_store_id.py
    • Added a Python script to retrieve the data store ID for Terraform external data source.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/scripts/setup_data_connector.py
    • Added a Python script to set up a GCS Data Connector for Vertex AI Search.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/scripts/setup_vector_search_collection.py
    • Added a Python script to create a Vector Search 2.0 Collection.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/scripts/start_connector_run.py
    • Added a Python script to import documents into a Vertex AI Search data store.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/vector_search.tf
    • Added Terraform resources for Vector Search 2.0 collections and GCS buckets across environments.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/vector_search_github.tf
    • Added GitHub Actions variables for Vector Search 2.0 configuration.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/vector_search_iam.tf
    • Added IAM roles for Vertex AI Pipeline service accounts for Vector Search 2.0.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/vector_search_service_accounts.tf
    • Added service account for Vertex AI Pipelines for Vector Search 2.0.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/vector_search_variables.tf
    • Defined variables for Vector Search 2.0 configuration.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/vertex_ai_search.tf
    • Added Terraform resources for Vertex AI Search data connectors and search engines across environments.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/vertex_ai_search_github.tf
    • Added GitHub Actions variables for Vertex AI Search configuration.
  • agent_starter_pack/agents/agentic_rag/deployment/terraform/vertex_ai_search_variables.tf
    • Defined variables for Vertex AI Search configuration.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/build_triggers.tf
    • Updated Cloud Build triggers to reflect new data ingestion paths and parameters.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/dev/apis.tf
    • Conditionally enabled the vectorsearch.googleapis.com API.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/dev/iam.tf
    • Removed generic Vertex AI pipeline service account and IAM roles.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/dev/providers.tf
    • Added null and external Terraform providers for Vertex AI Search data connector scripts.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/dev/storage.tf
    • Removed generic data ingestion storage and datastore resources.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/dev/variables.tf
    • Removed generic data ingestion related variables.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/dev/vars/env.tfvars
    • Updated default data_store_region to global.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/github.tf
    • Removed generic GitHub Actions variables for data ingestion.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/iam.tf
    • Removed generic Vertex AI pipeline service account and IAM roles.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/locals.tf
    • Conditionally enabled the vectorsearch.googleapis.com API in service lists.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/providers.tf
    • Added null and external Terraform providers for Vertex AI Search data connector scripts.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/service_accounts.tf
    • Removed generic Vertex AI pipeline service account.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/storage.tf
    • Removed generic data ingestion storage and datastore resources.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/variables.tf
    • Removed generic data ingestion related variables.
  • agent_starter_pack/base_templates/_shared/deployment/terraform/vars/env.tfvars
    • Updated default data_store_region to global and removed vector search specific variables.
  • agent_starter_pack/base_templates/python/.cloudbuild/deploy-to-prod.yaml
    • Updated Cloud Build workflow for data ingestion pipeline deployment to production.
  • agent_starter_pack/base_templates/python/.cloudbuild/staging.yaml
    • Updated Cloud Build workflow for data ingestion pipeline deployment to staging.
  • agent_starter_pack/base_templates/python/.github/workflows/deploy-to-prod.yaml
    • Updated GitHub Actions workflow for data ingestion pipeline deployment to production.
  • agent_starter_pack/base_templates/python/.github/workflows/staging.yaml
    • Updated GitHub Actions workflow for data ingestion pipeline deployment to staging.
  • agent_starter_pack/base_templates/python/Makefile
    • Updated Makefile targets for data ingestion setup and execution, replacing setup-dev-env with setup-datastore for data ingestion scenarios.
    • Modified environment variables for deployment.
  • agent_starter_pack/cli/commands/create.py
    • Removed the --include-data-ingestion CLI flag.
    • Updated datastore selection logic to auto-derive data ingestion enablement.
  • agent_starter_pack/cli/commands/enhance.py
    • Removed the --include-data-ingestion CLI flag and related logic.
  • agent_starter_pack/cli/utils/generation_metadata.py
    • Added include_data_ingestion to the list of keys to skip when generating CLI arguments.
  • agent_starter_pack/cli/utils/template.py
    • Updated conditional file exclusion for data_ingestion based on datastore_type.
    • Refined datastore prompting logic.
    • Replaced copy_data_ingestion_files with copy_sample_data_files.
    • Adjusted prototype mode deployment cleanup to preserve relevant Terraform for datastore setup.
  • agent_starter_pack/data_ingestion/README.md
    • Removed the file, as data ingestion documentation is now agent-specific.
  • agent_starter_pack/data_ingestion/data_ingestion_pipeline/components/ingest_data.py
    • Removed the file, as the component is now agent-specific.
  • agent_starter_pack/data_ingestion/data_ingestion_pipeline/components/process_data.py
    • Removed the file, as the component is now agent-specific.
  • agent_starter_pack/deployment_targets/agent_engine/python/deployment/terraform/dev/service.tf
    • Updated environment variables for DATA_STORE_ID to use external data source.
    • Updated VECTOR_SEARCH_COLLECTION environment variable format.
  • agent_starter_pack/deployment_targets/agent_engine/python/deployment/terraform/service.tf
    • Updated local variables for data_store_ids to use external data source.
    • Updated vector_search_collections locals to reflect new collection path format.
  • agent_starter_pack/deployment_targets/cloud_run/python/deployment/terraform/dev/service.tf
    • Updated environment variables for DATA_STORE_ID to use external data source.
    • Updated VECTOR_SEARCH_COLLECTION environment variable format.
  • agent_starter_pack/deployment_targets/cloud_run/python/deployment/terraform/service.tf
    • Updated local variables for data_store_ids to use external data source.
    • Updated vector_search_collections locals to reflect new collection path format.
  • docs/cli/create.md
    • Updated documentation to remove --include-data-ingestion and clarify --datastore behavior.
  • docs/cli/enhance.md
    • Updated documentation to remove --include-data-ingestion and clarify --datastore behavior.
  • docs/guide/data-ingestion.md
    • Rewrote the data ingestion guide to reflect new approaches for Vertex AI Search and Vector Search 2.0, including make setup-datastore and make sync-data.
  • docs/remote-templates/creating-remote-templates.md
    • Removed include_data_ingestion from example template configuration.
  • docs/remote-templates/using-remote-templates.md
    • Updated examples to remove --include-data-ingestion.
  • llm.txt
    • Updated CLI help text to remove --include-data-ingestion.
  • tests/cicd/test_e2e_deployment.py
    • Updated E2E test configurations to remove --include-data-ingestion.
  • tests/cli/commands/test_create.py
    • Updated create command tests to remove --include-data-ingestion.
  • tests/cli/commands/test_enhance.py
    • Updated enhance command tests to reflect removal of --include-data-ingestion and changes in data ingestion file population.
  • tests/cli/utils/test_generation_metadata.py
    • Updated tests for metadata to CLI arguments mapping, skipping include_data_ingestion.
  • tests/fixtures/makefile_hashes.json
    • Updated Makefile hash snapshots due to changes in data ingestion targets and environment variables.
  • tests/fixtures/makefile_snapshots/agentic_rag_cloud_run_vector_search.makefile
    • Updated Makefile snapshot for Vector Search configuration, reflecting new targets and environment variables.
  • tests/fixtures/makefile_snapshots/agentic_rag_cloud_run_vertex_search.makefile
    • Updated Makefile snapshot for Vertex AI Search configuration, reflecting new targets and environment variables.
  • tests/integration/test_makefile_usability.py
    • Updated Makefile usability tests to remove --include-data-ingestion.
  • tests/unit/test_makefile_template.py
    • Updated unit tests for Makefile templates to reflect new data ingestion targets and parameters for Vertex AI Search and Vector Search.
  • tests/utils/get_agents.py
    • Updated agent test combinations to remove --include-data-ingestion.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@eliasecchig eliasecchig force-pushed the feat/vas-data-connector branch from 0218d42 to 4d16d3f Compare February 16, 2026 18:54
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and well-executed refactoring to introduce support for Vertex AI Search and Vector Search 2.0 data connectors for the agentic_rag agent. The changes are comprehensive, touching everything from the agent logic and data ingestion pipelines to Terraform infrastructure, CI/CD workflows, and documentation. Key improvements include migrating data ingestion to be agent-specific, replacing shell scripts with more maintainable Python scripts for resource management, and simplifying the CLI by removing the --include-data-ingestion flag. The code is generally of high quality. I've identified a few opportunities to reduce code duplication in the Terraform configuration and to improve the configurability of the data processing component. Overall, this is an excellent contribution that enhances the flexibility and usability of the starter pack.

@eliasecchig eliasecchig force-pushed the feat/vas-data-connector branch 3 times, most recently from d3ec170 to b5d50b2 Compare February 16, 2026 19:51
…tic_rag

- Migrate data ingestion from shared location to agent-specific `agents/agentic_rag/data_ingestion/`
- Replace shell scripts with Python scripts for data connector setup and management
- Add Vector Search 2.0 Collections API support alongside Vertex AI Search
- Add `setup-datastore` CLI command and sample data for both datastore types
- Use dedicated asp-rag GCP projects for agentic_rag e2e tests
- Conditionally enable vectorsearch API only when datastore type requires it
- Add --wait flag to connector run for blocking sync support
@eliasecchig eliasecchig force-pushed the feat/vas-data-connector branch from b5d50b2 to 9d43485 Compare February 16, 2026 23:05
@eliasecchig eliasecchig merged commit ecb8c27 into main Feb 16, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant