Skip to content

feat(vespa): Move vespa to standalone lib#1231

Open
junaid-shirur wants to merge 3 commits intomainfrom
isolate_vespa
Open

feat(vespa): Move vespa to standalone lib#1231
junaid-shirur wants to merge 3 commits intomainfrom
isolate_vespa

Conversation

@junaid-shirur
Copy link
Collaborator

@junaid-shirur junaid-shirur commented Nov 26, 2025

Description

Testing

Additional Notes

Summary by CodeRabbit

  • Chores

    • Removed the Vespa search infrastructure: all search schemas, ranking rules, deployment configs, and orchestration scripts have been deleted.
    • Removed Vespa-related Docker compose and validation overrides, and simplified deployment surface.
  • Refactor

    • Cleanup utility updated to operate against a fixed set of content types for bulk-clearing operations.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This PR removes Vespa deployment/configuration and most Vespa schema files, deletes several Vespa utility scripts, and updates server/scripts/clear-vespa-data.ts to use a hardcoded list of my_content.* schema names and direct Vespa client operations instead of filesystem discovery.

Changes

Cohort / File(s) Summary
Vespa schema deletions
server/vespa/schemas/* (e.g., server/vespa/schemas/chat_*.sd, server/vespa/schemas/datasource*.sd, server/vespa/schemas/event.sd, server/vespa/schemas/file.sd, server/vespa/schemas/kb_items.sd, server/vespa/schemas/mail*.sd, server/vespa/schemas/user*.sd, server/vespa/schemas/user_query.sd)
Removed ~14 complete Vespa schema files including all document/type/field definitions, embeddings, fieldsets, document-summaries, rank-profiles, fuzzy/autocomplete configs, and related indexing/ranking logic.
Vespa deployment & Docker compose
server/vespa/services.xml, server/vespa/deploy.sh, server/vespa/deploy-docker.sh, server/vespa/deploy-pod.sh, server/vespa/validation-overrides.xml, deployment/docker-compose*.yml
Deleted Vespa deployment manifests and scripts: services.xml, multiple deploy scripts, validation overrides, and vespa service entries/volumes from docker-compose files.
Vespa utilities & rules
server/vespa/reindex.sh, server/vespa/replaceDIMS.ts, server/vespa/rules/searchrules.sr
Removed reindexing, schema-DIMS replacement utility, and Vespa search rules.
Clear Vespa data script
server/scripts/clear-vespa-data.ts
Replaced filesystem-based schema discovery with a hardcoded list of my_content.* schema names; narrowed Vespa imports to client operation types and instantiated a Vespa client for programmatic Get/Update/Delete calls; adjusted deletion and batching/logging logic accordingly.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Script as clear-vespa-data.ts
  participant Vespa as VespaClient
  note right of Script `#DDEEFF`: uses hardcoded schemas: my_content.*
  Script->>Vespa: list documents for schema (GetDocument / search)
  alt documents found
    loop per-batch
      Script->>Vespa: DeleteDocument (batch)
      Vespa-->>Script: delete result
    end
    Script->>Script: log progress / handle errors
  else no documents
    Vespa-->>Script: empty result
  end
  Script-->>Vespa: optional UpdateDocument (permissions/metadata)
  Vespa-->>Script: result
  Note over Vespa,Script: All interactions use Vespa client calls (programmatic)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Fileset is large and heterogeneous (many schema deletions + deployment removals + script refactor).
  • Pay extra attention to:
    • server/scripts/clear-vespa-data.ts — confirm hardcoded schema list correctness and deletion batching/error handling.
    • Any references elsewhere in the repo expecting removed schemas or services.xml.
    • CI/deployment pipelines and docker-compose changes that relied on the removed Vespa services.

Possibly related PRs

Suggested reviewers

  • zereraz
  • shivamashtikar
  • kalpadhwaryu

Poem

🐰
I nibble old schemas from the tree,
Pack up deploy scripts, set them free,
Hardcoded names now lead the way,
Vespa sleeps — I hop and play. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(vespa): Move vespa to standalone lib' accurately reflects the main changeset, which comprehensively removes Vespa schemas, deployment scripts, and Docker configurations, indicating a significant architectural shift in how Vespa is handled.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch isolate_vespa

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d28bf8 and 49eb19b.

📒 Files selected for processing (3)
  • deployment/docker-compose.dev.yml (0 hunks)
  • deployment/docker-compose.gpu.yml (0 hunks)
  • deployment/docker-compose.prod.yml (0 hunks)
💤 Files with no reviewable changes (3)
  • deployment/docker-compose.prod.yml
  • deployment/docker-compose.dev.yml
  • deployment/docker-compose.gpu.yml

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @junaid-shirur, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant refactoring of the Vespa search engine integration. By moving the Vespa configuration, schema definitions, and deployment logic into a separate, standalone library, the main application's codebase becomes cleaner and more modular. This change streamlines the project structure, reduces internal dependencies on Vespa-specific files, and prepares the system for more efficient management of its search infrastructure.

Highlights

  • Vespa Integration Externalization: The core Vespa search engine configuration and schema definitions have been moved out of the main application repository, indicating a shift towards a standalone library approach.
  • Removal of Vespa Configuration Files: All .sd schema definition files, deployment scripts (e.g., deploy-docker.sh, deploy.sh), and core configuration files (services.xml, validation-overrides.xml) previously located in server/vespa/ have been removed.
  • Simplified Schema Management: The clear-vespa-data.ts script now uses a hardcoded list of schema names instead of dynamically reading .sd files, reflecting the externalization of schema definitions and streamlining schema retrieval.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant refactoring that moves Vespa-related configurations, schemas, and scripts into a standalone library. This is a great step for modularity. The changes primarily involve the removal of numerous files from the server/vespa/ directory. My review focuses on the clear-vespa-data.ts script, which was updated to accommodate these changes. I've provided a suggestion to improve its maintainability by leveraging constants from the new library instead of hardcoded strings.

Comment on lines 17 to 35
async function getVespaSchemas(): Promise<string[]> {
// Returns prefixed names e.g. "my_content.file"
// Determine path relative to the current file's directory
const scriptDir = path.dirname(new URL(import.meta.url).pathname)
const schemasDir = path.join(scriptDir, "../vespa/schemas")
try {
const files = await fs.readdir(schemasDir)
const schemaNames = files
.filter((file) => file.endsWith(".sd"))
.map((file) => `my_content.${file.replace(".sd", "")}`) // No incorrect cast here
return schemaNames
} catch (error) {
console.error(`Error reading Vespa schemas directory ${schemasDir}:`, error)
return []
}
const schemaNames = [
"file",
"user",
"mail",
"mail_attachment",
"event",
"chat_message",
"chat_container",
"chat_user",
"chat_team",
"chat_attachment",
"datasource",
"datasource_file",
"kb_items",
"user_query",
]
return schemaNames.map((name) => `my_content.${name}`)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and avoid magic strings, it would be better to use the schema name constants that are already imported from @xyne/vespa-ts/types (e.g., fileSchema, userSchema).

This also reveals that some schemas (datasource, datasource_file, kb_items) are not imported as constants. If these are also defined in the new library, they should be imported and used as constants as well. This would make this list less brittle to future changes in the library.

async function getVespaSchemas(): Promise<string[]> {
  const schemaNames = [
    fileSchema,
    userSchema,
    mailSchema,
    mailAttachmentSchema,
    eventSchema,
    chatMessageSchema,
    chatContainerSchema,
    chatUserSchema,
    chatTeamSchema,
    chatAttachmentSchema,
    // The following are hardcoded. If constants for these are available
    // from `@xyne/vespa-ts/types`, please import and use them for consistency.
    "datasource",
    "datasource_file",
    "kb_items",
    userQuerySchema,
  ];
  return schemaNames.map((name) => `my_content.${name}`);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants