Skip to content

feat: ls tool addtiion kB search partion jaf logging#1272

Open
Aayushjshah wants to merge 1 commit intomainfrom
feat/messagBackup
Open

feat: ls tool addtiion kB search partion jaf logging#1272
Aayushjshah wants to merge 1 commit intomainfrom
feat/messagBackup

Conversation

@Aayushjshah
Copy link
Collaborator

@Aayushjshah Aayushjshah commented Mar 8, 2026

Description

Testing

Additional Notes

Summary by CodeRabbit

  • New Features

    • Added knowledge base navigation and listing tool for browsing accessible entries.
    • Introduced ability to exclude previously seen search results across tools.
  • Improvements

    • Enhanced search tool descriptions across Gmail, Drive, Contacts, Calendar, and Slack with clearer guidance on when to use each tool.
    • Improved Slack message search with better filtering and thread handling.
    • Expanded knowledge base search with scoped access and better result organization.
  • Internal

    • Added comprehensive structured logging for trace events and chat operations.
    • Enhanced database schema for knowledge base projection caching.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

📝 Walkthrough

Walkthrough

This PR implements a comprehensive knowledge base flow system with advanced listing and search capabilities, refactors logging across JAF modules, expands tool schemas with enhanced descriptions, adds database tables for projection caching, and replaces the legacy knowledge base tool with the new flow-based implementation.

Changes

Cohort / File(s) Summary
Knowledge Base Flow Architecture
server/api/chat/tools/knowledgeBaseFlow.ts
Introduces 1,400+ lines implementing KB navigation, search, and LS workflows with type-safe schema bridging, repository abstractions, scope-aware access control, Vespa-backed search, and comprehensive testing utilities. Exports KnowledgeBaseTarget types, LsKnowledgeBaseTool, and searchKnowledgeBaseTool.
Knowledge Base Tool Schemas
server/api/chat/tool-schemas.ts
Major schema overhaul (+534/-189): JSON Schema utilities for dynamic descriptions, expanded ToolCategory enum (Finalization, Fallback), richer tool outputs and contexts, refined knowledge-base exports, updated tool registry with new LS tool and knowledge-base examples, and standardized parameter descriptions.
JAF Integration & Logging
server/api/chat/jaf-logging.ts, server/api/chat/jaf-adapter.ts, server/api/chat/jaf-provider.ts
New centralized JAF trace event logging module (+297 lines) with type-safe handlers for 13+ event types and data preprocessing. JAF adapter refactors imports with type-only declarations. JAF provider adds extensive structured logging, contextual metadata, and MakeXyneJAFProviderOptions export (+161/-107 lines).
Agent Prompt & Tool Updates
server/api/chat/agentPromptCreation.ts, server/api/chat/tools/global/index.ts, server/api/chat/tools/google/*, server/api/chat/tools/slack/getSlackMessages.ts
AgentPromptSections extended with knowledgeBaseWorkflow field. Tool descriptions clarified and contextualized (searchGlobal, Gmail, Calendar, Drive, Contacts). Slack tool adds excludedIds filtering and expanded descriptions.
Tool Schemas & Utilities
server/api/chat/tools/schemas.ts
Adds excludedIdsSchema for suppressing prior results and extends baseToolParams; updates sortBySchema with optional modifier and clarified descriptions.
Knowledge Base Legacy Tool Removal
server/api/chat/tools/knowledgeBase.ts
Entire 139-line searchKnowledgeBaseTool module deleted; functionality migrated to knowledgeBaseFlow.ts.
Database Schema & Functions
server/db/schema/knowledgeBase.ts, server/db/knowledgeBase.ts
New collectionLsProjections table with built-from-source-updated-at tracking. New collection column lsProjectionSourceUpdatedAt. Added 5 public DB functions for projection lifecycle (touch, fetch, upsert, error recording). Integrated projection touches in item/folder creation and deletion paths (+113 lines).
Search & Vespa Integration
server/search/vespa.ts, server/search/utils.ts
searchVespaAgent signature extended with AgentApps parameter; structured logging for incoming options and processed selections. extractCollectionVespaIds adds translation logging (+12 lines).
File Processing & Metadata
server/queue/fileProcessor.ts
New mergeCollectionItemMetadata utility function consolidates existing metadata with computed processing results (+59/-26 lines).
Comprehensive Test Coverage
server/tests/knowledgeBaseFlow.test.ts, server/tests/knowledgeBaseTargetTranslation.test.ts, server/tests/messageAgentsFragments.test.ts
Added 1,490+ lines of test suites covering KB path canonicalization, target resolution, LS listings, projection rebuild logic, search filtering, Vespa translation, and excludedIds injection. Fixtures, mocks, and helpers included.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Chat Client
    participant Provider as JAF Provider
    participant KBFlow as KB Flow Handler
    participant Repo as KB Repository
    participant DB as Database
    participant Vespa as Vespa Search

    rect rgba(100, 150, 255, 0.5)
    Note over Client,Vespa: Knowledge Base LS (List) Flow
    Client->>Provider: Call ls tool (target, pagination)
    Provider->>KBFlow: executeLsKnowledgeBase(context, params)
    KBFlow->>KBFlow: resolveKnowledgeBaseTarget(target)
    KBFlow->>Repo: listScopedCollections(scope)
    Repo->>DB: Query collections, items
    DB-->>Repo: Collection & item data
    KBFlow->>KBFlow: buildLsEntries (projection-based or direct)
    KBFlow->>DB: hydrateCollectionItemsForEntries
    DB-->>KBFlow: Enriched items with metadata
    KBFlow-->>Provider: LS entries array
    Provider-->>Client: Formatted list response
    end

    rect rgba(150, 200, 100, 0.5)
    Note over Client,Vespa: Knowledge Base Search Flow
    Client->>Provider: Call search KB tool (query, selections)
    Provider->>KBFlow: executeSearchKnowledgeBase(context, params)
    KBFlow->>KBFlow: resolveKnowledgeBaseTarget (multiple targets)
    KBFlow->>KBFlow: buildSearchSelections (scope filtering)
    KBFlow->>KBFlow: toSelectionForResolvedTarget (expand folders)
    KBFlow->>Vespa: executeVespaSearch(query, selections, filters)
    Vespa-->>KBFlow: Fragment results with docIds
    KBFlow->>DB: buildItemCanonicalPath (for citations)
    DB-->>KBFlow: Path strings
    KBFlow-->>Provider: Fragments with canonical paths
    Provider-->>Client: Search results with sources
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~110 minutes

Possibly related PRs

Suggested reviewers

  • zereraz
  • shivamashtikar
  • junaid-shirur
  • kalpadhwaryu
  • devesh-juspay

Poem

🐰 A knowledge base blooms, indexed with care,
LS and search flow through the air!
Vespa hops faster, projections rebuild,
With scope-aware scopes, our tooling's fulfilled!
From legacy dust to flows crystal-clear,
The KB's new dance brings joy and good cheer! 🌱

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.93% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title contains multiple typos and unclear abbreviations that obscure the main changes; 'addtiion' should be 'addition', and 'partion' likely means 'partition', making it difficult to understand the actual scope. Clarify the title by fixing typos and using complete terms. Consider: 'feat: add ls knowledge base tool, search partition updates, and JAF logging' to better reflect the significant architectural changes across multiple modules.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/messagBackup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agentic architecture by introducing a new ls tool for browsing Knowledge Base content, complementing the existing searchKnowledgeBase tool with more granular targeting capabilities. It also brings substantial improvements to JAF logging for better system observability and refines tool descriptions to guide agents more effectively. These changes aim to improve the precision and efficiency of agent interactions with various data sources, particularly the Knowledge Base.

Highlights

  • New Knowledge Base Browsing Tool (ls): Introduced a new ls tool for the Knowledge Base, enabling agents to browse collections, folders, and files, discover accessible content, and inspect metadata. This tool supports depth, pagination, and metadata retrieval for precise navigation.
  • Enhanced Knowledge Base Search (searchKnowledgeBase): Refactored and enhanced the searchKnowledgeBase tool to support more precise target filtering, allowing agents to narrow searches to specific collections, folders, files, or paths. This improves search accuracy and relevance.
  • Improved JAF Logging: Added a new jaf-logging.ts file and integrated it across the JAF adapter and provider. This provides detailed, structured logging for JAF trace events, tool calls, and model interactions, significantly improving observability and debugging capabilities.
  • Refined Tool Descriptions: Updated descriptions for various search tools (Global, Gmail, Drive, Calendar, Contacts, Slack) to provide clearer guidance to agents on when and how to use each tool effectively, including details on parameters and their intended use.
  • Knowledge Base Projection Caching: Implemented a caching mechanism for Knowledge Base ls projections, storing pre-built navigation snapshots to optimize performance for repeated browsing requests. This includes logic for rebuilding stale projections and error handling.
  • Excluded IDs for Search Tools: Introduced excludedIds parameters across various search tools (Global, Gmail, Drive, Calendar, Contacts, Slack) and the Knowledge Base search, allowing agents to prevent re-retrieval of previously seen documents, improving search efficiency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • server/api/chat/agentPromptCreation.ts
    • Added knowledgeBaseWorkflow section to the agent prompt sections.
  • server/api/chat/jaf-adapter.ts
    • Reordered imports for consistency.
    • Changed tool registration logging from info to debug.
  • server/api/chat/jaf-logging.ts
    • Added new file to centralize JAF trace event logging with detailed event types and payload summaries.
  • server/api/chat/jaf-provider.ts
    • Reordered imports for consistency.
    • Updated logger initialization to include module context.
    • Replaced console.warn with structured Logger.warn and Logger.debug for image handling and prompt attachment issues.
    • Added userEmail and turnNumber to LLM call logging parameters.
  • server/api/chat/tool-schemas.ts
    • Imported new KB flow schemas and types.
    • Added ls tool schema with detailed description and examples.
    • Updated descriptions for SearchGlobalInputSchema, SearchGmailInputSchema, SearchDriveInputSchema, SearchCalendarInputSchema, SearchGoogleContactsInputSchema, and GetSlackRelatedMessagesInputSchema for clarity.
    • Modified generateToolDescriptions to use formatParameterLines for better parameter display and to show up to two examples.
  • server/api/chat/tools/global/index.ts
    • Updated searchGlobal tool description for better agent guidance.
  • server/api/chat/tools/google/calendar.ts
    • Updated searchCalendarEvents tool description and refined descriptions for attendees and status parameters.
  • server/api/chat/tools/google/contacts.ts
    • Updated searchGoogleContacts tool description for better agent guidance.
  • server/api/chat/tools/google/drive.ts
    • Updated searchDriveFiles tool description and refined descriptions for owner and filetype parameters.
  • server/api/chat/tools/google/gmail.ts
    • Updated searchGmail tool description and refined descriptions for participants and labels parameters.
  • server/api/chat/tools/knowledgeBase.ts
    • Removed file, as its functionality was refactored and moved to knowledgeBaseFlow.ts.
  • server/api/chat/tools/knowledgeBaseFlow.ts
    • Added new file implementing ls and searchKnowledgeBase tools.
    • Introduced detailed logic for Knowledge Base navigation, projection caching, and target resolution.
    • Defined KnowledgeBaseTargetSchema for precise KB location targeting.
  • server/api/chat/tools/schemas.ts
    • Added excludedIdsSchema for filtering previously seen document IDs.
    • Refined descriptions for limit, offset, sortBy, and timeRange schemas.
  • server/api/chat/tools/slack/getSlackMessages.ts
    • Updated getSlackRelatedMessages tool description and refined descriptions for channelName, user, and mentions parameters.
    • Added excludedIds filtering to the execute function.
  • server/db/knowledgeBase.ts
    • Added touchCollectionLsProjectionSource and touchCollectionLsProjectionSources to update projection source timestamps.
    • Implemented getCollectionLsProjection, upsertCollectionLsProjection, and recordCollectionLsProjectionError for managing KB projections.
    • Updated softDeleteCollection, softDeleteCollectionItem, createFolder, and createFileItem to trigger lsProjectionSourceUpdatedAt updates.
  • server/db/schema/knowledgeBase.ts
    • Added lsProjectionSourceUpdatedAt column to the collections table.
    • Introduced new collectionLsProjections table to store pre-built navigation projections for collections, including projection, builtFromSourceUpdatedAt, and lastError fields.
    • Defined relations for collectionLsProjections.
  • server/queue/fileProcessor.ts
    • Added mergeCollectionItemMetadata utility function.
    • Improved handling of parentId in handleRetryFailure for file processing.
    • Updated metadata merging logic when inserting Vespa documents to ensure existing metadata is preserved.
  • server/search/utils.ts
    • Added detailed logging for extractCollectionVespaIds to show translated Vespa field filters.
  • server/search/vespa.ts
    • Enhanced logging for searchVespaAgent to include more comprehensive details about incoming options and prepared Vespa search inputs.
  • server/tests/knowledgeBaseFlow.test.ts
    • Added new test file to cover the logic of knowledgeBaseFlow.ts, including canonicalizeKnowledgeBasePath, lsKnowledgeBase, and searchKnowledgeBase functionality.
  • server/tests/knowledgeBaseTargetTranslation.test.ts
    • Added new test file to verify the correct translation of Knowledge Base targets into Vespa queries.
  • server/tests/messageAgentsFragments.test.ts
    • Adjusted test timeout for excludedIds injection test.
Activity
  • A new file server/api/chat/jaf-logging.ts was added to centralize JAF trace event logging, indicating a new focus on detailed observability for agentic workflows.
  • The server/api/chat/tools/knowledgeBase.ts file was removed and its functionality was refactored into a new file server/api/chat/tools/knowledgeBaseFlow.ts, suggesting a significant overhaul and enhancement of Knowledge Base related tools.
  • New test files server/tests/knowledgeBaseFlow.test.ts and server/tests/knowledgeBaseTargetTranslation.test.ts were introduced, demonstrating comprehensive testing for the new Knowledge Base features and their integration with Vespa.
  • Logging within server/api/chat/jaf-provider.ts, server/search/utils.ts, and server/search/vespa.ts has been enhanced, replacing console.warn with structured Logger calls and adding more detailed information about search parameters and selections, which will aid in debugging and understanding agent behavior.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant feature for knowledge base interaction, including a new ls tool and improved searchKnowledgeBase capabilities, enhancing agent performance through improved prompts and tool schemas, and refactoring logging for better observability. However, a security audit identified issues related to the logging of sensitive information (PII and prompts) and an insecure order of operations in the KB tools that could lead to resource exhaustion. Additionally, the knowledgeBaseFlow.ts file is very large and could benefit from modularization, and there's an inefficiency and potential correctness issue in how excludedIds are handled in the Slack search tool that needs addressing.

Comment on lines +210 to +216
const excludedDocIds = new Set(params.excludedIds || [])
if (excludedDocIds.size > 0) {
allItems = allItems.filter((item) => {
const citation = searchToCitation(item)
return !excludedDocIds.has(citation.docId)
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Filtering excludedIds in memory after fetching results is inefficient and can lead to incorrect pagination. For instance, if a page of results is entirely filtered out, the tool will return empty-handed instead of fetching the next page. This filtering should be pushed down into the searchSlackMessages function to be handled at the search engine level for correctness and performance.

Comment on lines +22 to +74
function summarizeToolResultPayload(result: any): string {
if (!result) {
return "No result returned."
}
const summaryCandidates: Array<unknown> = [
result?.data?.summary,
result?.data?.result,
]
for (const candidate of summaryCandidates) {
if (typeof candidate === "string" && candidate.trim().length > 0) {
return truncateValue(candidate.trim(), 200)
}
}
if (typeof result?.data === "string") {
return truncateValue(result.data, 200)
}
try {
return truncateValue(JSON.stringify(result?.data ?? result), 200)
} catch {
return "Result unavailable."
}
}

function formatToolArgumentsForLogging(args: Record<string, unknown>): string {
if (!args || typeof args !== "object") {
return "{}"
}
const entries = Object.entries(args)
if (entries.length === 0) {
return "{}"
}
const parts = entries.map(([key, value]) => {
let serialized: string
if (typeof value === "string") {
serialized = `"${truncateValue(value, 80)}"`
} else if (
typeof value === "number" ||
typeof value === "boolean" ||
value === null
) {
serialized = String(value)
} else {
try {
serialized = truncateValue(JSON.stringify(value), 80)
} catch {
serialized = "[unserializable]"
}
}
return `${key}: ${serialized}`
})
const combined = parts.join(", ")
return truncateValue(combined, 400)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The logging utility functions summarizeToolResultPayload and formatToolArgumentsForLogging truncate values but do not redact sensitive information. Tool arguments and results can contain PII (e.g., user emails, names) or sensitive data (e.g., secrets, internal business info) which will be stored in plain text in the logs.

Recommendation: Implement a redaction layer to identify and mask sensitive patterns before logging.

Comment on lines +532 to +540
Logger.debug(
{
email: userEmail,
turn: turnNumber,
model: actualModelId,
prompt: sanitizedPrompt,
},
"[JAF Provider] FULL PROMPT/MESSAGES ARRAY SENT TO LLM",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The application logs the full prompt array sent to the LLM at the debug level. While file data is sanitized, the text content of all messages in the conversation history is logged in plain text. This results in PII and potentially sensitive user data being stored in the logs.

Recommendation: Avoid logging full prompt arrays in production, or ensure they are only logged at a trace level that is disabled by default.

Comment on lines +1231 to +1240
const cache = createNavigationCache()
const scopedCollectionIds = new Set(
collections.map((collection) => collection.id),
)
const resolvedTarget = await resolveKnowledgeBaseTarget(
params.target,
repo,
cache,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The ls tool resolves the requested target before verifying if the user has permission to access it. The resolution process for a collection target (via resolveKnowledgeBaseTarget) involves loading all its items from the database and potentially rebuilding a projection snapshot in memory. An attacker could provide a valid UUID of a large collection they do not own to force the server to perform these expensive operations, leading to resource exhaustion or denial of service.

Recommendation: Perform permission checks as early as possible. For collection and path targets, verify the collectionId against the user's allowed collections before calling resolveKnowledgeBaseTarget.

Suggested change
const cache = createNavigationCache()
const scopedCollectionIds = new Set(
collections.map((collection) => collection.id),
)
const resolvedTarget = await resolveKnowledgeBaseTarget(
params.target,
repo,
cache,
)
const cache = createNavigationCache()
const scopedCollectionIds = new Set(
collections.map((collection) => collection.id),
)
if (params.target.type === "collection" || params.target.type === "path") {
if (!scopedCollectionIds.has(params.target.collectionId)) {
return ToolResponse.error(
ToolErrorCodes.PERMISSION_DENIED,
"Requested knowledge base target is outside the current KB scope",
{ toolName: "ls" },
)
}
}
const resolvedTarget = await resolveKnowledgeBaseTarget(
params.target,
repo,
cache,
)

@@ -0,0 +1,1438 @@
import type { Tool } from "@xynehq/jaf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new file is over 1400 lines and contains a lot of complex logic for schemas, database interaction, caching, and tool execution. For better long-term maintainability, consider refactoring this into smaller, more focused modules. For example:

  • knowledgeBaseFlow/schemas.ts for Zod schemas.
  • knowledgeBaseFlow/repository.ts for the KnowledgeBaseRepository.
  • knowledgeBaseFlow/projection.ts for projection and caching logic.
  • knowledgeBaseFlow/executor.ts for the tool execution logic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/db/knowledgeBase.ts (1)

326-336: ⚠️ Potential issue | 🟠 Major

Wrap softDeleteCollectionItem and createFileItem in same transaction for OVERWRITE scenario.

At line 1573, softDeleteCollectionItem(db, ...) executes outside the transaction context, while createFileItem is wrapped in a separate db.transaction() at line 1620. This creates an atomicity gap: if file creation fails after the soft delete commits, the database will be left inconsistent with the old file deleted but no new file created.

The correct pattern is shown at line 1843 in the delete handler, where softDeleteCollectionItem(tx, itemId) and related cleanup operations are grouped within the same transaction block. Apply this same approach to the OVERWRITE branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/db/knowledgeBase.ts` around lines 326 - 336, The OVERWRITE flow
currently calls softDeleteCollectionItem(db, ...) outside the transaction while
createFileItem(...) runs inside db.transaction(), leading to non-atomic
behavior; change the OVERWRITE branch so that softDeleteCollectionItem,
createFileItem, and any related cleanup (e.g.,
touchCollectionLsProjectionSource) are all executed inside the same
db.transaction callback (use the transaction object tx and pass tx into
softDeleteCollectionItem(tx, ...), createFileItem(tx, ...), and related helpers)
to ensure the delete and new-file creation commit or roll back together.
🧹 Nitpick comments (7)
server/queue/fileProcessor.ts (1)

304-312: Potential path construction inconsistency.

The reconstructedFilePath variable handles both root and nested cases (lines 305-307), but it's only used in the root case (line 311). For nested paths, line 312 uses file.fileName directly without the reconstructedFilePath.

This appears intentional based on the comments, but the naming of reconstructedFilePath is misleading since it's only meaningful for the root case where it equals file.fileName anyway. Consider simplifying:

-      const reconstructedFilePath =
-        targetPath === "/"
-          ? file.fileName
-          : targetPath.substring(1) + file.fileName

       let vespaFileName =
         targetPath === "/"
-          ? file.collectionName + targetPath + reconstructedFilePath // Uses full path for root
+          ? file.collectionName + "/" + file.fileName // Root: collectionName/fileName
           : file.collectionName + targetPath + file.fileName // Uses filename for nested
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/queue/fileProcessor.ts` around lines 304 - 312, reconstructedFilePath
is misleading and unused for nested paths; simplify by removing
reconstructedFilePath and build vespaFileName directly from targetPath,
file.collectionName, and file.fileName (use targetPath === "/" ?
file.collectionName + "/" + file.fileName : file.collectionName + targetPath +
file.fileName or equivalent) so the logic is consistent and variable names
reflect actual usage; update the code paths that reference reconstructedFilePath
and ensure targetPath formatting (leading slash) is handled once when composing
vespaFileName.
server/search/utils.ts (1)

266-276: Consider using Logger.debug instead of Logger.info for this detailed translation log.

This function may be called frequently during search operations. Logging at info level could generate significant volume in production. The detailed payload (including all collection/folder/file IDs) is valuable for debugging but may be excessive for standard operation logs.

🔧 Suggested change
-  Logger.info(
+  Logger.debug(
     {
       collectionSelections: options.collectionSelections,
       vespaFieldTranslation: {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/search/utils.ts` around lines 266 - 276, The log in
extractCollectionVespaIds is too verbose for info level; change the Logger.info
call to Logger.debug so the detailed payload (options.collectionSelections and
the vespaFieldTranslation object using result.collectionIds,
result.collectionFolderIds, result.collectionFileIds) is only emitted at debug
level during troubleshooting; ensure the same message string
"[extractCollectionVespaIds] Translated collection selections to Vespa field
filters" and payload structure are preserved when switching to Logger.debug.
server/api/chat/tools/knowledgeBaseFlow.ts (2)

624-649: Potential N+1 query pattern in agent-scoped collection listing.

When scope === KnowledgeBaseScope.AgentScoped, the code iterates over baseSelections and calls repo.getCollectionItemById() individually for each folder and file ID, then repo.getCollectionById() for each unique collection. For agents with many selected items, this could result in numerous sequential DB queries.

Consider batching these lookups to reduce database round-trips:

♻️ Suggested batch optimization
+  // Collect all item IDs first
+  const allFolderIds: string[] = []
+  const allFileIds: string[] = []
+  for (const selection of scopeState.baseSelections) {
+    selection.collectionIds?.forEach((id) => collectionIds.add(id))
+    allFolderIds.push(...(selection.collectionFolderIds ?? []))
+    allFileIds.push(...(selection.collectionFileIds ?? []))
+  }
+
+  // Batch fetch all items in one query (requires repo method)
+  const allItems = await repo.getCollectionItemsByIds?.([...allFolderIds, ...allFileIds]) ?? []
+  for (const item of allItems) {
+    collectionIds.add(item.collectionId)
+  }

This would require adding a batch lookup method to KnowledgeBaseRepository.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/chat/tools/knowledgeBaseFlow.ts` around lines 624 - 649, The loop
in the function that builds collectionIds from scopeState.baseSelections causes
an N+1 query pattern by calling repo.getCollectionItemById() for each
folder/file and repo.getCollectionById() for each collection; modify this to
batch-fetch: add batch methods to KnowledgeBaseRepository (e.g.,
getCollectionItemsByIds(ids: string[]) and getCollectionsByIds(ids: string[])),
collect all folder/file ids first, call getCollectionItemsByIds once, extract
collectionIds, then call getCollectionsByIds once and return the filtered
collections, replacing the per-item repo.getCollectionItemById and
per-collection repo.getCollectionById calls.

1383-1389: Consider returning an empty success response instead of an error for no results.

Returning ToolResponse.error with EXECUTION_FAILED when no results are found may be semantically misleading—the tool executed correctly, it just found nothing. Returning an empty success response allows the agent to distinguish between "search worked but found nothing" and "search failed due to an error."

♻️ Alternative: return empty success
     if (!fragments.length) {
-      return ToolResponse.error(
-        ToolErrorCodes.EXECUTION_FAILED,
-        "No knowledge base results found for the query.",
-        { toolName: "searchKnowledgeBase" },
-      )
+      return ToolResponse.success({
+        result: "No knowledge base results found for the query.",
+        contexts: [],
+      })
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/chat/tools/knowledgeBaseFlow.ts` around lines 1383 - 1389, The
current branch returns ToolResponse.error with ToolErrorCodes.EXECUTION_FAILED
when the fragments array is empty, which semantically treats "no results" as a
failure; instead return a successful ToolResponse indicating zero results (e.g.,
use ToolResponse.success or the codebase's success wrapper) with an empty
results payload (e.g., { fragments: [] } or similar) and preserve the tool
metadata (toolName: "searchKnowledgeBase") so callers can distinguish "no
results" from an execution error; update the empty-case in the function handling
fragments to return that empty-success response rather than ToolResponse.error.
server/api/chat/tool-schemas.ts (2)

110-140: Consider consolidating duplicate participantsSchema definitions.

The MailParticipantSchema defined here duplicates the participantsSchema in server/api/chat/tools/google/gmail.ts (as shown in the context snippet). Both define the same structure with from, to, cc, bcc string arrays. Consider having one canonical definition and importing it where needed to avoid drift.

♻️ Option: Export from one location

Either export MailParticipantSchema from tool-schemas.ts and import it in gmail.ts, or vice versa, to maintain a single source of truth.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/chat/tool-schemas.ts` around lines 110 - 140, The
MailParticipantSchema defined here duplicates participantsSchema in gmail.ts;
pick one canonical export (either MailParticipantSchema or participantsSchema),
move the canonical definition to that file, export it, and update the other
module to import and use that exported symbol instead of redefining it so there
is a single source of truth for the { from, to, cc, bcc } string-array
participant schema.

1004-1034: Depth limit of 1 may truncate useful nested parameter docs.

The if (depth >= 1) continue check stops recursion after one level of nesting. For schemas with deeper structures (like filters.targets[].type), nested parameter descriptions won't appear in generated docs. This may be intentional to keep output concise, but verify this meets documentation needs.

💡 Consider increasing depth or making it configurable
 function formatParameterLines(
   schema: JSONSchema7,
   pathPrefix = "",
   depth = 0,
+  maxDepth = 2,
 ): string[] {
   ...
-    if (depth >= 1) continue
+    if (depth >= maxDepth) continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/chat/tool-schemas.ts` around lines 1004 - 1034, The function
formatParameterLines currently aborts recursion after one nested level via the
check if (depth >= 1) continue, which truncates deeper schema docs; update
formatParameterLines to accept and use a configurable maxDepth parameter (e.g.,
maxDepth = 2 or passed from the caller) instead of the hardcoded 1, replace the
depth check with if (depth >= maxDepth) continue, and ensure recursive calls
pass the maxDepth value through so deeply nested properties (e.g.,
filters.targets[].type) are included when desired.
server/api/chat/jaf-provider.ts (1)

859-863: Minor: Consider simplifying the buffer length check.

The nested ternary for data size detection works but could be slightly cleaner.

♻️ Optional simplification
-        const dataSize = Buffer.isBuffer(filePart.data)
-          ? filePart.data.length
-          : filePart.data instanceof Uint8Array
-            ? filePart.data.length
-            : 0
+        const dataSize =
+          filePart.data instanceof Uint8Array ? filePart.data.length : 0

Buffer in Node.js is a subclass of Uint8Array, so the single instanceof Uint8Array check covers both cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/chat/jaf-provider.ts` around lines 859 - 863, The dataSize
computation uses a nested ternary checking Buffer and Uint8Array separately;
simplify by relying on Uint8Array covering Buffer so replace the nested check in
the dataSize assignment for filePart with a single instanceof Uint8Array branch
(or equivalent truthy length check) to compute length when filePart.data is a
byte array and otherwise default to 0; update the expression that sets dataSize
(the variable named dataSize and the inspection of filePart.data) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/queue/fileProcessor.ts`:
- Around line 345-366: The metadata JSON.stringify block currently spreads
file.metadata last which lets stale keys in file.metadata override freshly
computed values (e.g., chunksCount, imageChunksCount, processingMethod); change
the order so that file.metadata is spread first (before computed fields like
chunksCount, imageChunksCount, processingMethod, sheetName/index/totalSheets and
pageTitle) so the computed values from processingResult override any keys in
file.metadata—ensure this mirrors the behavior of mergeCollectionItemMetadata
where updates override baseMetadata.

In `@server/tests/knowledgeBaseTargetTranslation.test.ts`:
- Around line 353-367: Update the test mock to use the enum value instead of a
string literal: replace entity: "file" with entity: KnowledgeBaseEntity.File in
the returned fragment and add an import for KnowledgeBaseEntity at the top of
the test file (alongside Apps import) so the symbol is available; ensure the
mock object still uses Apps.KnowledgeBase for app and keep the rest unchanged.

---

Outside diff comments:
In `@server/db/knowledgeBase.ts`:
- Around line 326-336: The OVERWRITE flow currently calls
softDeleteCollectionItem(db, ...) outside the transaction while
createFileItem(...) runs inside db.transaction(), leading to non-atomic
behavior; change the OVERWRITE branch so that softDeleteCollectionItem,
createFileItem, and any related cleanup (e.g.,
touchCollectionLsProjectionSource) are all executed inside the same
db.transaction callback (use the transaction object tx and pass tx into
softDeleteCollectionItem(tx, ...), createFileItem(tx, ...), and related helpers)
to ensure the delete and new-file creation commit or roll back together.

---

Nitpick comments:
In `@server/api/chat/jaf-provider.ts`:
- Around line 859-863: The dataSize computation uses a nested ternary checking
Buffer and Uint8Array separately; simplify by relying on Uint8Array covering
Buffer so replace the nested check in the dataSize assignment for filePart with
a single instanceof Uint8Array branch (or equivalent truthy length check) to
compute length when filePart.data is a byte array and otherwise default to 0;
update the expression that sets dataSize (the variable named dataSize and the
inspection of filePart.data) accordingly.

In `@server/api/chat/tool-schemas.ts`:
- Around line 110-140: The MailParticipantSchema defined here duplicates
participantsSchema in gmail.ts; pick one canonical export (either
MailParticipantSchema or participantsSchema), move the canonical definition to
that file, export it, and update the other module to import and use that
exported symbol instead of redefining it so there is a single source of truth
for the { from, to, cc, bcc } string-array participant schema.
- Around line 1004-1034: The function formatParameterLines currently aborts
recursion after one nested level via the check if (depth >= 1) continue, which
truncates deeper schema docs; update formatParameterLines to accept and use a
configurable maxDepth parameter (e.g., maxDepth = 2 or passed from the caller)
instead of the hardcoded 1, replace the depth check with if (depth >= maxDepth)
continue, and ensure recursive calls pass the maxDepth value through so deeply
nested properties (e.g., filters.targets[].type) are included when desired.

In `@server/api/chat/tools/knowledgeBaseFlow.ts`:
- Around line 624-649: The loop in the function that builds collectionIds from
scopeState.baseSelections causes an N+1 query pattern by calling
repo.getCollectionItemById() for each folder/file and repo.getCollectionById()
for each collection; modify this to batch-fetch: add batch methods to
KnowledgeBaseRepository (e.g., getCollectionItemsByIds(ids: string[]) and
getCollectionsByIds(ids: string[])), collect all folder/file ids first, call
getCollectionItemsByIds once, extract collectionIds, then call
getCollectionsByIds once and return the filtered collections, replacing the
per-item repo.getCollectionItemById and per-collection repo.getCollectionById
calls.
- Around line 1383-1389: The current branch returns ToolResponse.error with
ToolErrorCodes.EXECUTION_FAILED when the fragments array is empty, which
semantically treats "no results" as a failure; instead return a successful
ToolResponse indicating zero results (e.g., use ToolResponse.success or the
codebase's success wrapper) with an empty results payload (e.g., { fragments: []
} or similar) and preserve the tool metadata (toolName: "searchKnowledgeBase")
so callers can distinguish "no results" from an execution error; update the
empty-case in the function handling fragments to return that empty-success
response rather than ToolResponse.error.

In `@server/queue/fileProcessor.ts`:
- Around line 304-312: reconstructedFilePath is misleading and unused for nested
paths; simplify by removing reconstructedFilePath and build vespaFileName
directly from targetPath, file.collectionName, and file.fileName (use targetPath
=== "/" ? file.collectionName + "/" + file.fileName : file.collectionName +
targetPath + file.fileName or equivalent) so the logic is consistent and
variable names reflect actual usage; update the code paths that reference
reconstructedFilePath and ensure targetPath formatting (leading slash) is
handled once when composing vespaFileName.

In `@server/search/utils.ts`:
- Around line 266-276: The log in extractCollectionVespaIds is too verbose for
info level; change the Logger.info call to Logger.debug so the detailed payload
(options.collectionSelections and the vespaFieldTranslation object using
result.collectionIds, result.collectionFolderIds, result.collectionFileIds) is
only emitted at debug level during troubleshooting; ensure the same message
string "[extractCollectionVespaIds] Translated collection selections to Vespa
field filters" and payload structure are preserved when switching to
Logger.debug.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f392373-d1b5-4a0f-9832-08485239614a

📥 Commits

Reviewing files that changed from the base of the PR and between f16ea5e and f44da7e.

📒 Files selected for processing (23)
  • server/api/chat/agentPromptCreation.ts
  • server/api/chat/jaf-adapter.ts
  • server/api/chat/jaf-logging.ts
  • server/api/chat/jaf-provider.ts
  • server/api/chat/message-agents.ts
  • server/api/chat/tool-schemas.ts
  • server/api/chat/tools/global/index.ts
  • server/api/chat/tools/google/calendar.ts
  • server/api/chat/tools/google/contacts.ts
  • server/api/chat/tools/google/drive.ts
  • server/api/chat/tools/google/gmail.ts
  • server/api/chat/tools/knowledgeBase.ts
  • server/api/chat/tools/knowledgeBaseFlow.ts
  • server/api/chat/tools/schemas.ts
  • server/api/chat/tools/slack/getSlackMessages.ts
  • server/db/knowledgeBase.ts
  • server/db/schema/knowledgeBase.ts
  • server/queue/fileProcessor.ts
  • server/search/utils.ts
  • server/search/vespa.ts
  • server/tests/knowledgeBaseFlow.test.ts
  • server/tests/knowledgeBaseTargetTranslation.test.ts
  • server/tests/messageAgentsFragments.test.ts
💤 Files with no reviewable changes (1)
  • server/api/chat/tools/knowledgeBase.ts

Comment on lines 345 to 366
metadata: JSON.stringify({
originalFileName: file.originalName || file.fileName,
uploadedBy: file.uploadedByEmail || "system",
chunksCount: processingResult.chunks.length + processingResult.image_chunks.length,
chunksCount:
processingResult.chunks.length +
processingResult.image_chunks.length,
imageChunksCount: processingResult.image_chunks.length,
processingMethod: getBaseMimeType(file.mimeType || "text/plain"),
...(processingResult.processingMethod && { pdfProcessingMethod: processingResult.processingMethod }),
...(processingResult.processingMethod && {
pdfProcessingMethod: processingResult.processingMethod,
}),
...(pageTitle && { pageTitle }),
lastModified: Date.now(),
...(('sheetName' in processingResult) && {
...("sheetName" in processingResult && {
sheetName: (processingResult as SheetProcessingResult).sheetName,
sheetIndex: (processingResult as SheetProcessingResult).sheetIndex,
totalSheets: (processingResult as SheetProcessingResult).totalSheets,
totalSheets: (processingResult as SheetProcessingResult)
.totalSheets,
}),
...(typeof file.metadata === "object" && file.metadata !== null && { ...file.metadata }),
...(typeof file.metadata === "object" &&
file.metadata !== null && { ...file.metadata }),
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Metadata spread order may cause computed values to be overwritten.

The file.metadata spread at lines 364-365 comes after the computed fields (chunksCount, imageChunksCount, etc.). This means if file.metadata already contains these keys with stale values, they will override the freshly computed values.

This is inconsistent with mergeCollectionItemMetadata (line 388) where updates correctly override baseMetadata.

Consider moving the file.metadata spread to the beginning:

🔧 Suggested fix
        metadata: JSON.stringify({
+         ...(typeof file.metadata === "object" &&
+           file.metadata !== null && { ...file.metadata }),
          originalFileName: file.originalName || file.fileName,
          uploadedBy: file.uploadedByEmail || "system",
          chunksCount:
            processingResult.chunks.length +
            processingResult.image_chunks.length,
          imageChunksCount: processingResult.image_chunks.length,
          processingMethod: getBaseMimeType(file.mimeType || "text/plain"),
          ...(processingResult.processingMethod && {
            pdfProcessingMethod: processingResult.processingMethod,
          }),
          ...(pageTitle && { pageTitle }),
          lastModified: Date.now(),
          ...("sheetName" in processingResult && {
            sheetName: (processingResult as SheetProcessingResult).sheetName,
            sheetIndex: (processingResult as SheetProcessingResult).sheetIndex,
            totalSheets: (processingResult as SheetProcessingResult)
              .totalSheets,
          }),
-         ...(typeof file.metadata === "object" &&
-           file.metadata !== null && { ...file.metadata }),
        }),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
metadata: JSON.stringify({
originalFileName: file.originalName || file.fileName,
uploadedBy: file.uploadedByEmail || "system",
chunksCount: processingResult.chunks.length + processingResult.image_chunks.length,
chunksCount:
processingResult.chunks.length +
processingResult.image_chunks.length,
imageChunksCount: processingResult.image_chunks.length,
processingMethod: getBaseMimeType(file.mimeType || "text/plain"),
...(processingResult.processingMethod && { pdfProcessingMethod: processingResult.processingMethod }),
...(processingResult.processingMethod && {
pdfProcessingMethod: processingResult.processingMethod,
}),
...(pageTitle && { pageTitle }),
lastModified: Date.now(),
...(('sheetName' in processingResult) && {
...("sheetName" in processingResult && {
sheetName: (processingResult as SheetProcessingResult).sheetName,
sheetIndex: (processingResult as SheetProcessingResult).sheetIndex,
totalSheets: (processingResult as SheetProcessingResult).totalSheets,
totalSheets: (processingResult as SheetProcessingResult)
.totalSheets,
}),
...(typeof file.metadata === "object" && file.metadata !== null && { ...file.metadata }),
...(typeof file.metadata === "object" &&
file.metadata !== null && { ...file.metadata }),
}),
metadata: JSON.stringify({
...(typeof file.metadata === "object" &&
file.metadata !== null && { ...file.metadata }),
originalFileName: file.originalName || file.fileName,
uploadedBy: file.uploadedByEmail || "system",
chunksCount:
processingResult.chunks.length +
processingResult.image_chunks.length,
imageChunksCount: processingResult.image_chunks.length,
processingMethod: getBaseMimeType(file.mimeType || "text/plain"),
...(processingResult.processingMethod && {
pdfProcessingMethod: processingResult.processingMethod,
}),
...(pageTitle && { pageTitle }),
lastModified: Date.now(),
...("sheetName" in processingResult && {
sheetName: (processingResult as SheetProcessingResult).sheetName,
sheetIndex: (processingResult as SheetProcessingResult).sheetIndex,
totalSheets: (processingResult as SheetProcessingResult)
.totalSheets,
}),
}),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/queue/fileProcessor.ts` around lines 345 - 366, The metadata
JSON.stringify block currently spreads file.metadata last which lets stale keys
in file.metadata override freshly computed values (e.g., chunksCount,
imageChunksCount, processingMethod); change the order so that file.metadata is
spread first (before computed fields like chunksCount, imageChunksCount,
processingMethod, sheetName/index/totalSheets and pageTitle) so the computed
values from processingResult override any keys in file.metadata—ensure this
mirrors the behavior of mergeCollectionItemMetadata where updates override
baseMetadata.

Comment on lines +353 to +367
return [
{
id: "fragment-1",
content: "hit",
source: {
docId: "clf-spec",
title: "spec.md",
url: "",
app: Apps.KnowledgeBase,
entity: "file",
},
confidence: 0.9,
},
]
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix type error: entity must use enum value, not string literal.

The pipeline failure indicates that entity: "file" (string) is not assignable to the expected entity union type. Use KnowledgeBaseEntity.File instead.

🐛 Proposed fix

First, update the import at line 2:

-import { Apps, SearchModes } from "@xyne/vespa-ts/types"
+import { Apps, KnowledgeBaseEntity, SearchModes } from "@xyne/vespa-ts/types"

Then fix the mock return value:

     return [
       {
         id: "fragment-1",
         content: "hit",
         source: {
           docId: "clf-spec",
           title: "spec.md",
           url: "",
           app: Apps.KnowledgeBase,
-          entity: "file",
+          entity: KnowledgeBaseEntity.File,
         },
         confidence: 0.9,
       },
     ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/tests/knowledgeBaseTargetTranslation.test.ts` around lines 353 - 367,
Update the test mock to use the enum value instead of a string literal: replace
entity: "file" with entity: KnowledgeBaseEntity.File in the returned fragment
and add an import for KnowledgeBaseEntity at the top of the test file (alongside
Apps import) so the symbol is available; ensure the mock object still uses
Apps.KnowledgeBase for app and keep the rest unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants