feat(vespa): Move vespa to standalone lib by junaid-shirur · Pull Request #1231 · xynehq/xyne

junaid-shirur · 2025-11-26T06:58:11Z

Description

Testing

Additional Notes

Summary by CodeRabbit

Chores
- Removed the Vespa search infrastructure: all search schemas, ranking rules, deployment configs, and orchestration scripts have been deleted.
- Removed Vespa-related Docker compose and validation overrides, and simplified deployment surface.
Refactor
- Cleanup utility updated to operate against a fixed set of content types for bulk-clearing operations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-26T06:58:22Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This PR removes Vespa deployment/configuration and most Vespa schema files, deletes several Vespa utility scripts, and updates server/scripts/clear-vespa-data.ts to use a hardcoded list of my_content.* schema names and direct Vespa client operations instead of filesystem discovery.

Changes

Cohort / File(s)	Summary
Vespa schema deletions `server/vespa/schemas/` (e.g., `server/vespa/schemas/chat_.sd`, `server/vespa/schemas/datasource.sd`, `server/vespa/schemas/event.sd`, `server/vespa/schemas/file.sd`, `server/vespa/schemas/kb_items.sd`, `server/vespa/schemas/mail.sd`, `server/vespa/schemas/user*.sd`, `server/vespa/schemas/user_query.sd`)	Removed ~14 complete Vespa schema files including all document/type/field definitions, embeddings, fieldsets, document-summaries, rank-profiles, fuzzy/autocomplete configs, and related indexing/ranking logic.
Vespa deployment & Docker compose `server/vespa/services.xml`, `server/vespa/deploy.sh`, `server/vespa/deploy-docker.sh`, `server/vespa/deploy-pod.sh`, `server/vespa/validation-overrides.xml`, `deployment/docker-compose*.yml`	Deleted Vespa deployment manifests and scripts: services.xml, multiple deploy scripts, validation overrides, and vespa service entries/volumes from docker-compose files.
Vespa utilities & rules `server/vespa/reindex.sh`, `server/vespa/replaceDIMS.ts`, `server/vespa/rules/searchrules.sr`	Removed reindexing, schema-DIMS replacement utility, and Vespa search rules.
Clear Vespa data script `server/scripts/clear-vespa-data.ts`	Replaced filesystem-based schema discovery with a hardcoded list of `my_content.*` schema names; narrowed Vespa imports to client operation types and instantiated a Vespa client for programmatic Get/Update/Delete calls; adjusted deletion and batching/logging logic accordingly.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Script as clear-vespa-data.ts
  participant Vespa as VespaClient
  note right of Script `#DDEEFF`: uses hardcoded schemas: my_content.*
  Script->>Vespa: list documents for schema (GetDocument / search)
  alt documents found
    loop per-batch
      Script->>Vespa: DeleteDocument (batch)
      Vespa-->>Script: delete result
    end
    Script->>Script: log progress / handle errors
  else no documents
    Vespa-->>Script: empty result
  end
  Script-->>Vespa: optional UpdateDocument (permissions/metadata)
  Vespa-->>Script: result
  Note over Vespa,Script: All interactions use Vespa client calls (programmatic)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Fileset is large and heterogeneous (many schema deletions + deployment removals + script refactor).
Pay extra attention to:
- server/scripts/clear-vespa-data.ts — confirm hardcoded schema list correctness and deletion batching/error handling.
- Any references elsewhere in the repo expecting removed schemas or services.xml.
- CI/deployment pipelines and docker-compose changes that relied on the removed Vespa services.

Possibly related PRs

feat(vespa): Integrate Vespa via @xyne/vespa-ts TypeScript package #724 — centralizes Vespa functionality into a vespa client package and updates client/type imports; directly related to clear-vespa-data.ts import/usage changes.
refactor(vespa): upgrade Vespa image to 8.583.10 and enable GPU runtime #952 — modifies Vespa deployment artifacts (deploy scripts, docker-compose, services) and likely conflicts with removal of deployment/config files here.
fix(@Search):Add New Rank Profile to boost title in @ search #685 — edits rank-profiles and schema logic for files that are deleted in this PR; potential merge conflicts or overlap.

Suggested reviewers

zereraz
shivamashtikar
kalpadhwaryu

Poem

🐰
I nibble old schemas from the tree,
Pack up deploy scripts, set them free,
Hardcoded names now lead the way,
Vespa sleeps — I hop and play. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(vespa): Move vespa to standalone lib' accurately reflects the main changeset, which comprehensively removes Vespa schemas, deployment scripts, and Docker configurations, indicating a significant architectural shift in how Vespa is handled.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch isolate_vespa

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d28bf8 and 49eb19b.

📒 Files selected for processing (3)

deployment/docker-compose.dev.yml (0 hunks)
deployment/docker-compose.gpu.yml (0 hunks)
deployment/docker-compose.prod.yml (0 hunks)

💤 Files with no reviewable changes (3)

deployment/docker-compose.prod.yml
deployment/docker-compose.dev.yml
deployment/docker-compose.gpu.yml

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-11-26T06:58:37Z

Summary of Changes

Hello @junaid-shirur, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant refactoring of the Vespa search engine integration. By moving the Vespa configuration, schema definitions, and deployment logic into a separate, standalone library, the main application's codebase becomes cleaner and more modular. This change streamlines the project structure, reduces internal dependencies on Vespa-specific files, and prepares the system for more efficient management of its search infrastructure.

Highlights

Vespa Integration Externalization: The core Vespa search engine configuration and schema definitions have been moved out of the main application repository, indicating a shift towards a standalone library approach.
Removal of Vespa Configuration Files: All .sd schema definition files, deployment scripts (e.g., deploy-docker.sh, deploy.sh), and core configuration files (services.xml, validation-overrides.xml) previously located in server/vespa/ have been removed.
Simplified Schema Management: The clear-vespa-data.ts script now uses a hardcoded list of schema names instead of dynamically reading .sd files, reflecting the externalization of schema definitions and streamlining schema retrieval.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is a significant refactoring that moves Vespa-related configurations, schemas, and scripts into a standalone library. This is a great step for modularity. The changes primarily involve the removal of numerous files from the server/vespa/ directory. My review focuses on the clear-vespa-data.ts script, which was updated to accommodate these changes. I've provided a suggestion to improve its maintainability by leveraging constants from the new library instead of hardcoded strings.

gemini-code-assist · 2025-11-26T07:00:40Z

server/scripts/clear-vespa-data.ts

 async function getVespaSchemas(): Promise<string[]> {
-  // Returns prefixed names e.g. "my_content.file"
-  // Determine path relative to the current file's directory
-  const scriptDir = path.dirname(new URL(import.meta.url).pathname)
-  const schemasDir = path.join(scriptDir, "../vespa/schemas")
-  try {
-    const files = await fs.readdir(schemasDir)
-    const schemaNames = files
-      .filter((file) => file.endsWith(".sd"))
-      .map((file) => `my_content.${file.replace(".sd", "")}`) // No incorrect cast here
-    return schemaNames
-  } catch (error) {
-    console.error(`Error reading Vespa schemas directory ${schemasDir}:`, error)
-    return []
-  }
+  const schemaNames = [
+    "file",
+    "user",
+    "mail",
+    "mail_attachment",
+    "event",
+    "chat_message",
+    "chat_container",
+    "chat_user",
+    "chat_team",
+    "chat_attachment",
+    "datasource",
+    "datasource_file",
+    "kb_items",
+    "user_query",
+  ]
+  return schemaNames.map((name) => `my_content.${name}`)
 }


To improve maintainability and avoid magic strings, it would be better to use the schema name constants that are already imported from @xyne/vespa-ts/types (e.g., fileSchema, userSchema).

This also reveals that some schemas (datasource, datasource_file, kb_items) are not imported as constants. If these are also defined in the new library, they should be imported and used as constants as well. This would make this list less brittle to future changes in the library.

async function getVespaSchemas(): Promise<string[]> { const schemaNames = [ fileSchema, userSchema, mailSchema, mailAttachmentSchema, eventSchema, chatMessageSchema, chatContainerSchema, chatUserSchema, chatTeamSchema, chatAttachmentSchema, // The following are hardcoded. If constants for these are available // from `@xyne/vespa-ts/types`, please import and use them for consistency. "datasource", "datasource_file", "kb_items", userQuerySchema, ]; return schemaNames.map((name) => `my_content.${name}`); }

@junaid-shirur

junaid-shirur added 2 commits November 26, 2025 12:22

feat(vespa): delete vespa schemas

f010a8d

feat(vespa): remove comment

0d28bf8

junaid-shirur requested review from devesh-juspay, kalpadhwaryu, shivamashtikar and zereraz as code owners November 26, 2025 06:58

gemini-code-assist bot reviewed Nov 26, 2025

View reviewed changes

feat(vespa): update yml files

49eb19b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vespa): Move vespa to standalone lib#1231

feat(vespa): Move vespa to standalone lib#1231
junaid-shirur wants to merge 3 commits intomainfrom
isolate_vespa

junaid-shirur commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

gemini-code-assist bot commented Nov 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 26, 2025

Uh oh!

shivamashtikar Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

junaid-shirur commented Nov 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Additional Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Nov 26, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

shivamashtikar Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

junaid-shirur commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 26, 2025 •

edited

Loading