feat: Projects with shared knowledge (mini-RAG)#2192
Open
tessaherself wants to merge 47 commits intohuggingface:mainfrom
Open
feat: Projects with shared knowledge (mini-RAG)#2192tessaherself wants to merge 47 commits intohuggingface:mainfrom
tessaherself wants to merge 47 commits intohuggingface:mainfrom
Conversation
… env and mobile nav
…tag for easier ci
- Add BINARY_DOC_ALLOWLIST for PDF, DOCX, XLSX, PPTX etc. - Add COOKIE_SECURE=true and COOKIE_SAMESITE=lax to production - Configure HF_ORG_ADMIN=xpartners-admins for admin access - Fix avatar to use user.avatarUrl from OIDC instead of HuggingFace - Change 'Add text file' to 'Add file', hide MCP Servers menu - Add debug logging in prepareFiles() for file upload tracing - Add file-upload-flow.md documentation
…gface#1) [Aikido] AI Fix for NoSQL injection attack possible
…uest huggingface#2) [Aikido] Fix critical issue in @sveltejs/kit via minor version upgrade from 2.21.2 to 2.49.5
…gface#3) [Aikido] AI Fix for NoSQL injection attack possible
- Fix NoSQL injection vulnerabilities by adding mongoSanitize.ts utility - Fix timing attack in adminToken.ts using timingSafeEqual - Add SSRF protection in isURLLocal.ts and models.ts - Pin Docker base images to SHA digests in Dockerfile - Add Kubernetes security context to deployment.yaml (runAsNonRoot, readOnlyRootFilesystem, drop ALL capabilities) - Add NetworkPolicy for pod network isolation - Pin GitHub Actions to specific SHA commits - Fix path traversal in findRepoRoot.ts - Update vulnerable dependencies (@aws-sdk, @modelcontextprotocol/sdk, elysia, pino)
nanoid(7) uses alphabet A-Za-z0-9_- so the sanitizer regex was too restrictive.
- Add explicit authFilter variables before MongoDB queries - Add SECURITY comments to explain sanitization happens internally - Sanitize params.id with sanitizeObjectIdString() before ObjectId - This makes the data flow clearer for static analysis (Aikido)
elysia 1.4 removed 'error' export, replaced by 'status()'
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CI Dockerfile was running `npm run dev` (Vite dev server) in production, causing slow page loads and janky UI. Switch to a multi-stage build that runs `npm run build` and serves the compiled output via `node build/index.js`, matching the main Dockerfile pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of navigating away to home when the parent app sends a model switch message, PATCH the current conversation's model and refresh the page data. This lets users switch models mid-conversation without losing context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Projects let users organize conversations with shared custom instructions, default models, and uploaded knowledge files. Knowledge injection uses a two-tier system: context stuffing for small files (<50k chars) and chunk+retrieve via HuggingFace TEI embeddings for larger knowledge bases. New types: Project, ProjectKnowledgeFile, ProjectKnowledgeChunk New API routes: /projects CRUD + /projects/:id/files CRUD New UI: sidebar project section, create/edit modal, file upload manager Pipeline: resolveProjectKnowledge injects into preprompt at generation time Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
Related discussion for community feedback on the knowledge/RAG approach: #2193 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf5f6533ae
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
- Remove debug console.log statements from prepareFiles and streaming
update handler that leaked attachment content and conversation data
- Remove hardcoded setTheme("light") that overwrote user theme preference
- Fix sanitizeParamId regex to accept _ and - in nanoid share IDs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Projects feature — named containers for conversations with shared context:
Knowledge injection (two-tier system)
TEI_ENDPOINTenv var.Tier selection is automatic based on total project knowledge size. Falls back gracefully to Tier 1 if no TEI endpoint is configured.
What's included
Project,ProjectKnowledgeFile,ProjectKnowledgeChunk/api/v2/projects/...)authConditiontextGeneration/index.ts— knowledge becomes part of prepromptDesign decisions
Conversation/Assistantownership patterns (userId/sessionId)pdf-parsefor PDF text extractionConfiguration
TEI_ENDPOINTPROJECT_KNOWLEDGE_CHAR_THRESHOLD50000PROJECT_KNOWLEDGE_CHUNK_SIZE1000PROJECT_KNOWLEDGE_CHUNK_OVERLAP200PROJECT_KNOWLEDGE_TOP_K5Relationship to existing work
Complementary to the drag-to-group branch (
claude/drag-conversations-grouping-GpwPJ) — that adds drag-and-drop as an interaction pattern, while Projects add semantic meaning (instructions, model defaults, knowledge). They could be combined in the future.Test plan
TEI_ENDPOINT→ upload a large file → verify chunks are embedded and retrieved (Tier 2)🤖 Generated with Claude Code