feat: ls tool addtiion kB search partion jaf logging#1272
feat: ls tool addtiion kB search partion jaf logging#1272Aayushjshah wants to merge 1 commit intomainfrom
Conversation
📝 WalkthroughWalkthroughThis PR implements a comprehensive knowledge base flow system with advanced listing and search capabilities, refactors logging across JAF modules, expands tool schemas with enhanced descriptions, adds database tables for projection caching, and replaces the legacy knowledge base tool with the new flow-based implementation. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Chat Client
participant Provider as JAF Provider
participant KBFlow as KB Flow Handler
participant Repo as KB Repository
participant DB as Database
participant Vespa as Vespa Search
rect rgba(100, 150, 255, 0.5)
Note over Client,Vespa: Knowledge Base LS (List) Flow
Client->>Provider: Call ls tool (target, pagination)
Provider->>KBFlow: executeLsKnowledgeBase(context, params)
KBFlow->>KBFlow: resolveKnowledgeBaseTarget(target)
KBFlow->>Repo: listScopedCollections(scope)
Repo->>DB: Query collections, items
DB-->>Repo: Collection & item data
KBFlow->>KBFlow: buildLsEntries (projection-based or direct)
KBFlow->>DB: hydrateCollectionItemsForEntries
DB-->>KBFlow: Enriched items with metadata
KBFlow-->>Provider: LS entries array
Provider-->>Client: Formatted list response
end
rect rgba(150, 200, 100, 0.5)
Note over Client,Vespa: Knowledge Base Search Flow
Client->>Provider: Call search KB tool (query, selections)
Provider->>KBFlow: executeSearchKnowledgeBase(context, params)
KBFlow->>KBFlow: resolveKnowledgeBaseTarget (multiple targets)
KBFlow->>KBFlow: buildSearchSelections (scope filtering)
KBFlow->>KBFlow: toSelectionForResolvedTarget (expand folders)
KBFlow->>Vespa: executeVespaSearch(query, selections, filters)
Vespa-->>KBFlow: Fragment results with docIds
KBFlow->>DB: buildItemCanonicalPath (for citations)
DB-->>KBFlow: Path strings
KBFlow-->>Provider: Fragments with canonical paths
Provider-->>Client: Search results with sources
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~110 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the agentic architecture by introducing a new Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant feature for knowledge base interaction, including a new ls tool and improved searchKnowledgeBase capabilities, enhancing agent performance through improved prompts and tool schemas, and refactoring logging for better observability. However, a security audit identified issues related to the logging of sensitive information (PII and prompts) and an insecure order of operations in the KB tools that could lead to resource exhaustion. Additionally, the knowledgeBaseFlow.ts file is very large and could benefit from modularization, and there's an inefficiency and potential correctness issue in how excludedIds are handled in the Slack search tool that needs addressing.
| const excludedDocIds = new Set(params.excludedIds || []) | ||
| if (excludedDocIds.size > 0) { | ||
| allItems = allItems.filter((item) => { | ||
| const citation = searchToCitation(item) | ||
| return !excludedDocIds.has(citation.docId) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Filtering excludedIds in memory after fetching results is inefficient and can lead to incorrect pagination. For instance, if a page of results is entirely filtered out, the tool will return empty-handed instead of fetching the next page. This filtering should be pushed down into the searchSlackMessages function to be handled at the search engine level for correctness and performance.
| function summarizeToolResultPayload(result: any): string { | ||
| if (!result) { | ||
| return "No result returned." | ||
| } | ||
| const summaryCandidates: Array<unknown> = [ | ||
| result?.data?.summary, | ||
| result?.data?.result, | ||
| ] | ||
| for (const candidate of summaryCandidates) { | ||
| if (typeof candidate === "string" && candidate.trim().length > 0) { | ||
| return truncateValue(candidate.trim(), 200) | ||
| } | ||
| } | ||
| if (typeof result?.data === "string") { | ||
| return truncateValue(result.data, 200) | ||
| } | ||
| try { | ||
| return truncateValue(JSON.stringify(result?.data ?? result), 200) | ||
| } catch { | ||
| return "Result unavailable." | ||
| } | ||
| } | ||
|
|
||
| function formatToolArgumentsForLogging(args: Record<string, unknown>): string { | ||
| if (!args || typeof args !== "object") { | ||
| return "{}" | ||
| } | ||
| const entries = Object.entries(args) | ||
| if (entries.length === 0) { | ||
| return "{}" | ||
| } | ||
| const parts = entries.map(([key, value]) => { | ||
| let serialized: string | ||
| if (typeof value === "string") { | ||
| serialized = `"${truncateValue(value, 80)}"` | ||
| } else if ( | ||
| typeof value === "number" || | ||
| typeof value === "boolean" || | ||
| value === null | ||
| ) { | ||
| serialized = String(value) | ||
| } else { | ||
| try { | ||
| serialized = truncateValue(JSON.stringify(value), 80) | ||
| } catch { | ||
| serialized = "[unserializable]" | ||
| } | ||
| } | ||
| return `${key}: ${serialized}` | ||
| }) | ||
| const combined = parts.join(", ") | ||
| return truncateValue(combined, 400) | ||
| } |
There was a problem hiding this comment.
The logging utility functions summarizeToolResultPayload and formatToolArgumentsForLogging truncate values but do not redact sensitive information. Tool arguments and results can contain PII (e.g., user emails, names) or sensitive data (e.g., secrets, internal business info) which will be stored in plain text in the logs.
Recommendation: Implement a redaction layer to identify and mask sensitive patterns before logging.
| Logger.debug( | ||
| { | ||
| email: userEmail, | ||
| turn: turnNumber, | ||
| model: actualModelId, | ||
| prompt: sanitizedPrompt, | ||
| }, | ||
| "[JAF Provider] FULL PROMPT/MESSAGES ARRAY SENT TO LLM", | ||
| ) |
There was a problem hiding this comment.
The application logs the full prompt array sent to the LLM at the debug level. While file data is sanitized, the text content of all messages in the conversation history is logged in plain text. This results in PII and potentially sensitive user data being stored in the logs.
Recommendation: Avoid logging full prompt arrays in production, or ensure they are only logged at a trace level that is disabled by default.
| const cache = createNavigationCache() | ||
| const scopedCollectionIds = new Set( | ||
| collections.map((collection) => collection.id), | ||
| ) | ||
| const resolvedTarget = await resolveKnowledgeBaseTarget( | ||
| params.target, | ||
| repo, | ||
| cache, | ||
| ) | ||
|
|
There was a problem hiding this comment.
The ls tool resolves the requested target before verifying if the user has permission to access it. The resolution process for a collection target (via resolveKnowledgeBaseTarget) involves loading all its items from the database and potentially rebuilding a projection snapshot in memory. An attacker could provide a valid UUID of a large collection they do not own to force the server to perform these expensive operations, leading to resource exhaustion or denial of service.
Recommendation: Perform permission checks as early as possible. For collection and path targets, verify the collectionId against the user's allowed collections before calling resolveKnowledgeBaseTarget.
| const cache = createNavigationCache() | |
| const scopedCollectionIds = new Set( | |
| collections.map((collection) => collection.id), | |
| ) | |
| const resolvedTarget = await resolveKnowledgeBaseTarget( | |
| params.target, | |
| repo, | |
| cache, | |
| ) | |
| const cache = createNavigationCache() | |
| const scopedCollectionIds = new Set( | |
| collections.map((collection) => collection.id), | |
| ) | |
| if (params.target.type === "collection" || params.target.type === "path") { | |
| if (!scopedCollectionIds.has(params.target.collectionId)) { | |
| return ToolResponse.error( | |
| ToolErrorCodes.PERMISSION_DENIED, | |
| "Requested knowledge base target is outside the current KB scope", | |
| { toolName: "ls" }, | |
| ) | |
| } | |
| } | |
| const resolvedTarget = await resolveKnowledgeBaseTarget( | |
| params.target, | |
| repo, | |
| cache, | |
| ) |
| @@ -0,0 +1,1438 @@ | |||
| import type { Tool } from "@xynehq/jaf" | |||
There was a problem hiding this comment.
This new file is over 1400 lines and contains a lot of complex logic for schemas, database interaction, caching, and tool execution. For better long-term maintainability, consider refactoring this into smaller, more focused modules. For example:
knowledgeBaseFlow/schemas.tsfor Zod schemas.knowledgeBaseFlow/repository.tsfor theKnowledgeBaseRepository.knowledgeBaseFlow/projection.tsfor projection and caching logic.knowledgeBaseFlow/executor.tsfor the tool execution logic.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/db/knowledgeBase.ts (1)
326-336:⚠️ Potential issue | 🟠 MajorWrap
softDeleteCollectionItemandcreateFileItemin same transaction for OVERWRITE scenario.At line 1573,
softDeleteCollectionItem(db, ...)executes outside the transaction context, whilecreateFileItemis wrapped in a separatedb.transaction()at line 1620. This creates an atomicity gap: if file creation fails after the soft delete commits, the database will be left inconsistent with the old file deleted but no new file created.The correct pattern is shown at line 1843 in the delete handler, where
softDeleteCollectionItem(tx, itemId)and related cleanup operations are grouped within the same transaction block. Apply this same approach to the OVERWRITE branch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/db/knowledgeBase.ts` around lines 326 - 336, The OVERWRITE flow currently calls softDeleteCollectionItem(db, ...) outside the transaction while createFileItem(...) runs inside db.transaction(), leading to non-atomic behavior; change the OVERWRITE branch so that softDeleteCollectionItem, createFileItem, and any related cleanup (e.g., touchCollectionLsProjectionSource) are all executed inside the same db.transaction callback (use the transaction object tx and pass tx into softDeleteCollectionItem(tx, ...), createFileItem(tx, ...), and related helpers) to ensure the delete and new-file creation commit or roll back together.
🧹 Nitpick comments (7)
server/queue/fileProcessor.ts (1)
304-312: Potential path construction inconsistency.The
reconstructedFilePathvariable handles both root and nested cases (lines 305-307), but it's only used in the root case (line 311). For nested paths, line 312 usesfile.fileNamedirectly without thereconstructedFilePath.This appears intentional based on the comments, but the naming of
reconstructedFilePathis misleading since it's only meaningful for the root case where it equalsfile.fileNameanyway. Consider simplifying:- const reconstructedFilePath = - targetPath === "/" - ? file.fileName - : targetPath.substring(1) + file.fileName let vespaFileName = targetPath === "/" - ? file.collectionName + targetPath + reconstructedFilePath // Uses full path for root + ? file.collectionName + "/" + file.fileName // Root: collectionName/fileName : file.collectionName + targetPath + file.fileName // Uses filename for nested🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/queue/fileProcessor.ts` around lines 304 - 312, reconstructedFilePath is misleading and unused for nested paths; simplify by removing reconstructedFilePath and build vespaFileName directly from targetPath, file.collectionName, and file.fileName (use targetPath === "/" ? file.collectionName + "/" + file.fileName : file.collectionName + targetPath + file.fileName or equivalent) so the logic is consistent and variable names reflect actual usage; update the code paths that reference reconstructedFilePath and ensure targetPath formatting (leading slash) is handled once when composing vespaFileName.server/search/utils.ts (1)
266-276: Consider usingLogger.debuginstead ofLogger.infofor this detailed translation log.This function may be called frequently during search operations. Logging at
infolevel could generate significant volume in production. The detailed payload (including all collection/folder/file IDs) is valuable for debugging but may be excessive for standard operation logs.🔧 Suggested change
- Logger.info( + Logger.debug( { collectionSelections: options.collectionSelections, vespaFieldTranslation: {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/search/utils.ts` around lines 266 - 276, The log in extractCollectionVespaIds is too verbose for info level; change the Logger.info call to Logger.debug so the detailed payload (options.collectionSelections and the vespaFieldTranslation object using result.collectionIds, result.collectionFolderIds, result.collectionFileIds) is only emitted at debug level during troubleshooting; ensure the same message string "[extractCollectionVespaIds] Translated collection selections to Vespa field filters" and payload structure are preserved when switching to Logger.debug.server/api/chat/tools/knowledgeBaseFlow.ts (2)
624-649: Potential N+1 query pattern in agent-scoped collection listing.When
scope === KnowledgeBaseScope.AgentScoped, the code iterates overbaseSelectionsand callsrepo.getCollectionItemById()individually for each folder and file ID, thenrepo.getCollectionById()for each unique collection. For agents with many selected items, this could result in numerous sequential DB queries.Consider batching these lookups to reduce database round-trips:
♻️ Suggested batch optimization
+ // Collect all item IDs first + const allFolderIds: string[] = [] + const allFileIds: string[] = [] + for (const selection of scopeState.baseSelections) { + selection.collectionIds?.forEach((id) => collectionIds.add(id)) + allFolderIds.push(...(selection.collectionFolderIds ?? [])) + allFileIds.push(...(selection.collectionFileIds ?? [])) + } + + // Batch fetch all items in one query (requires repo method) + const allItems = await repo.getCollectionItemsByIds?.([...allFolderIds, ...allFileIds]) ?? [] + for (const item of allItems) { + collectionIds.add(item.collectionId) + }This would require adding a batch lookup method to
KnowledgeBaseRepository.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/api/chat/tools/knowledgeBaseFlow.ts` around lines 624 - 649, The loop in the function that builds collectionIds from scopeState.baseSelections causes an N+1 query pattern by calling repo.getCollectionItemById() for each folder/file and repo.getCollectionById() for each collection; modify this to batch-fetch: add batch methods to KnowledgeBaseRepository (e.g., getCollectionItemsByIds(ids: string[]) and getCollectionsByIds(ids: string[])), collect all folder/file ids first, call getCollectionItemsByIds once, extract collectionIds, then call getCollectionsByIds once and return the filtered collections, replacing the per-item repo.getCollectionItemById and per-collection repo.getCollectionById calls.
1383-1389: Consider returning an empty success response instead of an error for no results.Returning
ToolResponse.errorwithEXECUTION_FAILEDwhen no results are found may be semantically misleading—the tool executed correctly, it just found nothing. Returning an empty success response allows the agent to distinguish between "search worked but found nothing" and "search failed due to an error."♻️ Alternative: return empty success
if (!fragments.length) { - return ToolResponse.error( - ToolErrorCodes.EXECUTION_FAILED, - "No knowledge base results found for the query.", - { toolName: "searchKnowledgeBase" }, - ) + return ToolResponse.success({ + result: "No knowledge base results found for the query.", + contexts: [], + }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/api/chat/tools/knowledgeBaseFlow.ts` around lines 1383 - 1389, The current branch returns ToolResponse.error with ToolErrorCodes.EXECUTION_FAILED when the fragments array is empty, which semantically treats "no results" as a failure; instead return a successful ToolResponse indicating zero results (e.g., use ToolResponse.success or the codebase's success wrapper) with an empty results payload (e.g., { fragments: [] } or similar) and preserve the tool metadata (toolName: "searchKnowledgeBase") so callers can distinguish "no results" from an execution error; update the empty-case in the function handling fragments to return that empty-success response rather than ToolResponse.error.server/api/chat/tool-schemas.ts (2)
110-140: Consider consolidating duplicateparticipantsSchemadefinitions.The
MailParticipantSchemadefined here duplicates theparticipantsSchemainserver/api/chat/tools/google/gmail.ts(as shown in the context snippet). Both define the same structure withfrom,to,cc,bccstring arrays. Consider having one canonical definition and importing it where needed to avoid drift.♻️ Option: Export from one location
Either export
MailParticipantSchemafromtool-schemas.tsand import it ingmail.ts, or vice versa, to maintain a single source of truth.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/api/chat/tool-schemas.ts` around lines 110 - 140, The MailParticipantSchema defined here duplicates participantsSchema in gmail.ts; pick one canonical export (either MailParticipantSchema or participantsSchema), move the canonical definition to that file, export it, and update the other module to import and use that exported symbol instead of redefining it so there is a single source of truth for the { from, to, cc, bcc } string-array participant schema.
1004-1034: Depth limit of 1 may truncate useful nested parameter docs.The
if (depth >= 1) continuecheck stops recursion after one level of nesting. For schemas with deeper structures (likefilters.targets[].type), nested parameter descriptions won't appear in generated docs. This may be intentional to keep output concise, but verify this meets documentation needs.💡 Consider increasing depth or making it configurable
function formatParameterLines( schema: JSONSchema7, pathPrefix = "", depth = 0, + maxDepth = 2, ): string[] { ... - if (depth >= 1) continue + if (depth >= maxDepth) continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/api/chat/tool-schemas.ts` around lines 1004 - 1034, The function formatParameterLines currently aborts recursion after one nested level via the check if (depth >= 1) continue, which truncates deeper schema docs; update formatParameterLines to accept and use a configurable maxDepth parameter (e.g., maxDepth = 2 or passed from the caller) instead of the hardcoded 1, replace the depth check with if (depth >= maxDepth) continue, and ensure recursive calls pass the maxDepth value through so deeply nested properties (e.g., filters.targets[].type) are included when desired.server/api/chat/jaf-provider.ts (1)
859-863: Minor: Consider simplifying the buffer length check.The nested ternary for data size detection works but could be slightly cleaner.
♻️ Optional simplification
- const dataSize = Buffer.isBuffer(filePart.data) - ? filePart.data.length - : filePart.data instanceof Uint8Array - ? filePart.data.length - : 0 + const dataSize = + filePart.data instanceof Uint8Array ? filePart.data.length : 0
Bufferin Node.js is a subclass ofUint8Array, so the singleinstanceof Uint8Arraycheck covers both cases.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/api/chat/jaf-provider.ts` around lines 859 - 863, The dataSize computation uses a nested ternary checking Buffer and Uint8Array separately; simplify by relying on Uint8Array covering Buffer so replace the nested check in the dataSize assignment for filePart with a single instanceof Uint8Array branch (or equivalent truthy length check) to compute length when filePart.data is a byte array and otherwise default to 0; update the expression that sets dataSize (the variable named dataSize and the inspection of filePart.data) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/queue/fileProcessor.ts`:
- Around line 345-366: The metadata JSON.stringify block currently spreads
file.metadata last which lets stale keys in file.metadata override freshly
computed values (e.g., chunksCount, imageChunksCount, processingMethod); change
the order so that file.metadata is spread first (before computed fields like
chunksCount, imageChunksCount, processingMethod, sheetName/index/totalSheets and
pageTitle) so the computed values from processingResult override any keys in
file.metadata—ensure this mirrors the behavior of mergeCollectionItemMetadata
where updates override baseMetadata.
In `@server/tests/knowledgeBaseTargetTranslation.test.ts`:
- Around line 353-367: Update the test mock to use the enum value instead of a
string literal: replace entity: "file" with entity: KnowledgeBaseEntity.File in
the returned fragment and add an import for KnowledgeBaseEntity at the top of
the test file (alongside Apps import) so the symbol is available; ensure the
mock object still uses Apps.KnowledgeBase for app and keep the rest unchanged.
---
Outside diff comments:
In `@server/db/knowledgeBase.ts`:
- Around line 326-336: The OVERWRITE flow currently calls
softDeleteCollectionItem(db, ...) outside the transaction while
createFileItem(...) runs inside db.transaction(), leading to non-atomic
behavior; change the OVERWRITE branch so that softDeleteCollectionItem,
createFileItem, and any related cleanup (e.g.,
touchCollectionLsProjectionSource) are all executed inside the same
db.transaction callback (use the transaction object tx and pass tx into
softDeleteCollectionItem(tx, ...), createFileItem(tx, ...), and related helpers)
to ensure the delete and new-file creation commit or roll back together.
---
Nitpick comments:
In `@server/api/chat/jaf-provider.ts`:
- Around line 859-863: The dataSize computation uses a nested ternary checking
Buffer and Uint8Array separately; simplify by relying on Uint8Array covering
Buffer so replace the nested check in the dataSize assignment for filePart with
a single instanceof Uint8Array branch (or equivalent truthy length check) to
compute length when filePart.data is a byte array and otherwise default to 0;
update the expression that sets dataSize (the variable named dataSize and the
inspection of filePart.data) accordingly.
In `@server/api/chat/tool-schemas.ts`:
- Around line 110-140: The MailParticipantSchema defined here duplicates
participantsSchema in gmail.ts; pick one canonical export (either
MailParticipantSchema or participantsSchema), move the canonical definition to
that file, export it, and update the other module to import and use that
exported symbol instead of redefining it so there is a single source of truth
for the { from, to, cc, bcc } string-array participant schema.
- Around line 1004-1034: The function formatParameterLines currently aborts
recursion after one nested level via the check if (depth >= 1) continue, which
truncates deeper schema docs; update formatParameterLines to accept and use a
configurable maxDepth parameter (e.g., maxDepth = 2 or passed from the caller)
instead of the hardcoded 1, replace the depth check with if (depth >= maxDepth)
continue, and ensure recursive calls pass the maxDepth value through so deeply
nested properties (e.g., filters.targets[].type) are included when desired.
In `@server/api/chat/tools/knowledgeBaseFlow.ts`:
- Around line 624-649: The loop in the function that builds collectionIds from
scopeState.baseSelections causes an N+1 query pattern by calling
repo.getCollectionItemById() for each folder/file and repo.getCollectionById()
for each collection; modify this to batch-fetch: add batch methods to
KnowledgeBaseRepository (e.g., getCollectionItemsByIds(ids: string[]) and
getCollectionsByIds(ids: string[])), collect all folder/file ids first, call
getCollectionItemsByIds once, extract collectionIds, then call
getCollectionsByIds once and return the filtered collections, replacing the
per-item repo.getCollectionItemById and per-collection repo.getCollectionById
calls.
- Around line 1383-1389: The current branch returns ToolResponse.error with
ToolErrorCodes.EXECUTION_FAILED when the fragments array is empty, which
semantically treats "no results" as a failure; instead return a successful
ToolResponse indicating zero results (e.g., use ToolResponse.success or the
codebase's success wrapper) with an empty results payload (e.g., { fragments: []
} or similar) and preserve the tool metadata (toolName: "searchKnowledgeBase")
so callers can distinguish "no results" from an execution error; update the
empty-case in the function handling fragments to return that empty-success
response rather than ToolResponse.error.
In `@server/queue/fileProcessor.ts`:
- Around line 304-312: reconstructedFilePath is misleading and unused for nested
paths; simplify by removing reconstructedFilePath and build vespaFileName
directly from targetPath, file.collectionName, and file.fileName (use targetPath
=== "/" ? file.collectionName + "/" + file.fileName : file.collectionName +
targetPath + file.fileName or equivalent) so the logic is consistent and
variable names reflect actual usage; update the code paths that reference
reconstructedFilePath and ensure targetPath formatting (leading slash) is
handled once when composing vespaFileName.
In `@server/search/utils.ts`:
- Around line 266-276: The log in extractCollectionVespaIds is too verbose for
info level; change the Logger.info call to Logger.debug so the detailed payload
(options.collectionSelections and the vespaFieldTranslation object using
result.collectionIds, result.collectionFolderIds, result.collectionFileIds) is
only emitted at debug level during troubleshooting; ensure the same message
string "[extractCollectionVespaIds] Translated collection selections to Vespa
field filters" and payload structure are preserved when switching to
Logger.debug.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6f392373-d1b5-4a0f-9832-08485239614a
📒 Files selected for processing (23)
server/api/chat/agentPromptCreation.tsserver/api/chat/jaf-adapter.tsserver/api/chat/jaf-logging.tsserver/api/chat/jaf-provider.tsserver/api/chat/message-agents.tsserver/api/chat/tool-schemas.tsserver/api/chat/tools/global/index.tsserver/api/chat/tools/google/calendar.tsserver/api/chat/tools/google/contacts.tsserver/api/chat/tools/google/drive.tsserver/api/chat/tools/google/gmail.tsserver/api/chat/tools/knowledgeBase.tsserver/api/chat/tools/knowledgeBaseFlow.tsserver/api/chat/tools/schemas.tsserver/api/chat/tools/slack/getSlackMessages.tsserver/db/knowledgeBase.tsserver/db/schema/knowledgeBase.tsserver/queue/fileProcessor.tsserver/search/utils.tsserver/search/vespa.tsserver/tests/knowledgeBaseFlow.test.tsserver/tests/knowledgeBaseTargetTranslation.test.tsserver/tests/messageAgentsFragments.test.ts
💤 Files with no reviewable changes (1)
- server/api/chat/tools/knowledgeBase.ts
| metadata: JSON.stringify({ | ||
| originalFileName: file.originalName || file.fileName, | ||
| uploadedBy: file.uploadedByEmail || "system", | ||
| chunksCount: processingResult.chunks.length + processingResult.image_chunks.length, | ||
| chunksCount: | ||
| processingResult.chunks.length + | ||
| processingResult.image_chunks.length, | ||
| imageChunksCount: processingResult.image_chunks.length, | ||
| processingMethod: getBaseMimeType(file.mimeType || "text/plain"), | ||
| ...(processingResult.processingMethod && { pdfProcessingMethod: processingResult.processingMethod }), | ||
| ...(processingResult.processingMethod && { | ||
| pdfProcessingMethod: processingResult.processingMethod, | ||
| }), | ||
| ...(pageTitle && { pageTitle }), | ||
| lastModified: Date.now(), | ||
| ...(('sheetName' in processingResult) && { | ||
| ...("sheetName" in processingResult && { | ||
| sheetName: (processingResult as SheetProcessingResult).sheetName, | ||
| sheetIndex: (processingResult as SheetProcessingResult).sheetIndex, | ||
| totalSheets: (processingResult as SheetProcessingResult).totalSheets, | ||
| totalSheets: (processingResult as SheetProcessingResult) | ||
| .totalSheets, | ||
| }), | ||
| ...(typeof file.metadata === "object" && file.metadata !== null && { ...file.metadata }), | ||
| ...(typeof file.metadata === "object" && | ||
| file.metadata !== null && { ...file.metadata }), | ||
| }), |
There was a problem hiding this comment.
Metadata spread order may cause computed values to be overwritten.
The file.metadata spread at lines 364-365 comes after the computed fields (chunksCount, imageChunksCount, etc.). This means if file.metadata already contains these keys with stale values, they will override the freshly computed values.
This is inconsistent with mergeCollectionItemMetadata (line 388) where updates correctly override baseMetadata.
Consider moving the file.metadata spread to the beginning:
🔧 Suggested fix
metadata: JSON.stringify({
+ ...(typeof file.metadata === "object" &&
+ file.metadata !== null && { ...file.metadata }),
originalFileName: file.originalName || file.fileName,
uploadedBy: file.uploadedByEmail || "system",
chunksCount:
processingResult.chunks.length +
processingResult.image_chunks.length,
imageChunksCount: processingResult.image_chunks.length,
processingMethod: getBaseMimeType(file.mimeType || "text/plain"),
...(processingResult.processingMethod && {
pdfProcessingMethod: processingResult.processingMethod,
}),
...(pageTitle && { pageTitle }),
lastModified: Date.now(),
...("sheetName" in processingResult && {
sheetName: (processingResult as SheetProcessingResult).sheetName,
sheetIndex: (processingResult as SheetProcessingResult).sheetIndex,
totalSheets: (processingResult as SheetProcessingResult)
.totalSheets,
}),
- ...(typeof file.metadata === "object" &&
- file.metadata !== null && { ...file.metadata }),
}),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| metadata: JSON.stringify({ | |
| originalFileName: file.originalName || file.fileName, | |
| uploadedBy: file.uploadedByEmail || "system", | |
| chunksCount: processingResult.chunks.length + processingResult.image_chunks.length, | |
| chunksCount: | |
| processingResult.chunks.length + | |
| processingResult.image_chunks.length, | |
| imageChunksCount: processingResult.image_chunks.length, | |
| processingMethod: getBaseMimeType(file.mimeType || "text/plain"), | |
| ...(processingResult.processingMethod && { pdfProcessingMethod: processingResult.processingMethod }), | |
| ...(processingResult.processingMethod && { | |
| pdfProcessingMethod: processingResult.processingMethod, | |
| }), | |
| ...(pageTitle && { pageTitle }), | |
| lastModified: Date.now(), | |
| ...(('sheetName' in processingResult) && { | |
| ...("sheetName" in processingResult && { | |
| sheetName: (processingResult as SheetProcessingResult).sheetName, | |
| sheetIndex: (processingResult as SheetProcessingResult).sheetIndex, | |
| totalSheets: (processingResult as SheetProcessingResult).totalSheets, | |
| totalSheets: (processingResult as SheetProcessingResult) | |
| .totalSheets, | |
| }), | |
| ...(typeof file.metadata === "object" && file.metadata !== null && { ...file.metadata }), | |
| ...(typeof file.metadata === "object" && | |
| file.metadata !== null && { ...file.metadata }), | |
| }), | |
| metadata: JSON.stringify({ | |
| ...(typeof file.metadata === "object" && | |
| file.metadata !== null && { ...file.metadata }), | |
| originalFileName: file.originalName || file.fileName, | |
| uploadedBy: file.uploadedByEmail || "system", | |
| chunksCount: | |
| processingResult.chunks.length + | |
| processingResult.image_chunks.length, | |
| imageChunksCount: processingResult.image_chunks.length, | |
| processingMethod: getBaseMimeType(file.mimeType || "text/plain"), | |
| ...(processingResult.processingMethod && { | |
| pdfProcessingMethod: processingResult.processingMethod, | |
| }), | |
| ...(pageTitle && { pageTitle }), | |
| lastModified: Date.now(), | |
| ...("sheetName" in processingResult && { | |
| sheetName: (processingResult as SheetProcessingResult).sheetName, | |
| sheetIndex: (processingResult as SheetProcessingResult).sheetIndex, | |
| totalSheets: (processingResult as SheetProcessingResult) | |
| .totalSheets, | |
| }), | |
| }), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/queue/fileProcessor.ts` around lines 345 - 366, The metadata
JSON.stringify block currently spreads file.metadata last which lets stale keys
in file.metadata override freshly computed values (e.g., chunksCount,
imageChunksCount, processingMethod); change the order so that file.metadata is
spread first (before computed fields like chunksCount, imageChunksCount,
processingMethod, sheetName/index/totalSheets and pageTitle) so the computed
values from processingResult override any keys in file.metadata—ensure this
mirrors the behavior of mergeCollectionItemMetadata where updates override
baseMetadata.
| return [ | ||
| { | ||
| id: "fragment-1", | ||
| content: "hit", | ||
| source: { | ||
| docId: "clf-spec", | ||
| title: "spec.md", | ||
| url: "", | ||
| app: Apps.KnowledgeBase, | ||
| entity: "file", | ||
| }, | ||
| confidence: 0.9, | ||
| }, | ||
| ] | ||
| }) |
There was a problem hiding this comment.
Fix type error: entity must use enum value, not string literal.
The pipeline failure indicates that entity: "file" (string) is not assignable to the expected entity union type. Use KnowledgeBaseEntity.File instead.
🐛 Proposed fix
First, update the import at line 2:
-import { Apps, SearchModes } from "@xyne/vespa-ts/types"
+import { Apps, KnowledgeBaseEntity, SearchModes } from "@xyne/vespa-ts/types"Then fix the mock return value:
return [
{
id: "fragment-1",
content: "hit",
source: {
docId: "clf-spec",
title: "spec.md",
url: "",
app: Apps.KnowledgeBase,
- entity: "file",
+ entity: KnowledgeBaseEntity.File,
},
confidence: 0.9,
},
]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/tests/knowledgeBaseTargetTranslation.test.ts` around lines 353 - 367,
Update the test mock to use the enum value instead of a string literal: replace
entity: "file" with entity: KnowledgeBaseEntity.File in the returned fragment
and add an import for KnowledgeBaseEntity at the top of the test file (alongside
Apps import) so the symbol is available; ensure the mock object still uses
Apps.KnowledgeBase for app and keep the rest unchanged.
Description
Testing
Additional Notes
Summary by CodeRabbit
New Features
Improvements
Internal