Feature/file upload integration #62

Noname397 · 2025-12-01T01:21:28Z

📝 Description

This pull request introduces a new document upload and storage feature integrated with Supabase, along with related backend and frontend dependency updates. The main changes include new document models, Supabase storage service utilities, API endpoint registration for documents, configuration additions for file uploads, and required package updates.

Backend: Document Upload & Storage Integration

Added document models in models/document.py to support file upload, metadata, permissions, and responses for document operations.
Implemented Supabase storage service in services/storage.py for uploading, deleting, and generating URLs for files, including validation helpers for file type and size.
Registered the new document API endpoints in backend routing (api/endpoints/__init__.py, api/routes.py) and exported document models in models/__init__.py.
Added configuration options in core/config.py for Supabase credentials, storage bucket, max file size, and allowed file types.

Backend: Dependency Updates

Enabled required backend packages for Supabase and file handling: supabase, asyncpg, sqlalchemy[asyncio], python-multipart, and aiofiles in requirements.txt.

Frontend: Dependency Updates

Added @tanstack/react-query for data fetching and updated uuid to version 13.0.0 for unique identifier generation in package.json and package-lock.json.

Test instruction

Generate two valid UUIDs (mock user_id + thread_id).
Insert them into Supabase tables with simple INSERT SQL.
Hardcode these UUIDs in AssistantPage.jsx.
Call your endpoint using those IDs and check Supabase for expected changes.

🎯 Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📚 Documentation update
🔨 Refactoring (no functional changes)
🧪 Tests (adding or updating tests)
🔧 Chore (dependency updates, config changes, etc.)

🧪 Testing

I have tested this change locally
[] I have added/updated tests for this change
[] All existing tests pass

📋 Checklist

My code follows the code style of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings

📸 Screenshots (if applicable)

N/A

🔗 Related Issues

backend/requirements.txt

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 28 comments.

Files not reviewed (1)

frontend/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-01T01:55:17Z

backend/app/services/storage.py

+            file=file_content,
+            file_options={
+                "content-type": file_type,
+                "upsert": "false",  # Don't overwrite existing files


The upsert option is set to "false" (string), but it should be a boolean value False. Check the Supabase Python SDK documentation to verify the correct type, as strings may not be properly interpreted.

Suggested change

"upsert": "false", # Don't overwrite existing files

"upsert": False, # Don't overwrite existing files

Copilot · 2025-12-01T01:55:17Z

backend/app/api/endpoints/documents.py

+
+
+@router.get("/{document_id}/download")
+async def get_document_download_url(document_id: str, expires_in: int = 3600):


Missing validation for expires_in parameter. Negative or excessively large values could cause issues. Add validation to ensure it's within a reasonable range:

if expires_in <= 0 or expires_in > 86400: # Max 24 hours raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, detail="expires_in must be between 1 and 86400 seconds (24 hours)" )

Copilot · 2025-12-01T01:55:17Z

backend/app/api/endpoints/documents.py

+        # Delete from Supabase Storage
+        try:
+            await delete_file(document["storage_ref"])
+        except StorageError as e:
+            logger.warning(
+                f"Failed to delete from storage (continuing with DB delete): {e}"
+            )
+
+        # Delete from database
+        delete_query = "DELETE FROM documents WHERE id = %s"
+        affected_rows = execute_statement(delete_query, (str(doc_uuid),))


If storage deletion fails but database deletion succeeds, the document metadata is removed but the file remains in storage, creating an orphaned file. Consider either:

Failing the entire operation if storage deletion fails

Implementing a background cleanup job to remove orphaned files

Logging orphaned files for manual cleanup

The current approach (continuing after storage failure) creates data inconsistency.

Copilot · 2025-12-01T01:55:17Z

frontend/src/popup/AssistantPage.jsx

+  if (downloadData?.download_url && selectedDocumentId) {
+    window.open(downloadData.download_url, "_blank");
+    setSelectedDocumentId(null); // Reset after opening
+  }


This implementation causes the download URL to open in a new tab on every render when downloadData and selectedDocumentId are truthy. This should be inside a useEffect hook to prevent it from executing on every render cycle.

useEffect(() => { if (downloadData?.download_url && selectedDocumentId) { window.open(downloadData.download_url, "_blank"); setSelectedDocumentId(null); } }, [downloadData, selectedDocumentId]);

Suggested change

if (downloadData?.download_url && selectedDocumentId) {

window.open(downloadData.download_url, "_blank");

setSelectedDocumentId(null); // Reset after opening

}

import { useEffect } from "react";

useEffect(() => {

if (downloadData?.download_url && selectedDocumentId) {

window.open(downloadData.download_url, "_blank");

setSelectedDocumentId(null); // Reset after opening

}

}, [downloadData, selectedDocumentId]);

Copilot · 2025-12-01T01:55:18Z

backend/app/models/document.py

+class DocumentUploadRequest(BaseModel):
+    """Request model for file upload endpoint."""
+
+    thread_id: UUID = Field(..., description="Thread to attach document to")
+    permission: PermissionEnum = Field(
+        default=PermissionEnum.PRIVATE, description="Access permission level"
+    )
+    metadata: Optional[Dict[str, Any]] = Field(
+        default=None, description="Additional metadata"
+    )


DocumentUploadRequest model is defined but never used in the endpoints. The upload endpoint uses Form parameters directly instead of utilizing this Pydantic model. Consider either using the model for validation or removing it to reduce code clutter.

Copilot · 2025-12-01T01:55:22Z

frontend/src/popup/AssistantPage.jsx

+        alert("Files uploaded successfully!");
+        refetchDocuments();
+      } catch (error) {
+        console.error("Error uploading files:", error);
+        alert("Error uploading files. Please try again.");


[nitpick] Using alert() for error messages is not user-friendly and doesn't follow modern UI patterns. Consider using a proper toast notification system or error state display component instead.

Copilot · 2025-12-01T01:55:22Z

backend/app/api/endpoints/documents.py

+        if not result:
+            # Rollback storage upload if database insert fails
+            try:
+                await delete_file(storage_ref)
+            except StorageError:
+                logger.error(f"Failed to rollback storage upload: {storage_ref}")
+
+            raise HTTPException(
+                status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                detail="Failed to save document metadata",
+            )


Potential race condition: if the file is successfully uploaded to storage but the database insert fails, the rollback attempts to delete the file. However, if the deletion also fails, the file remains orphaned in storage with no database record. Consider implementing a cleanup job or improving the error handling to ensure consistency between storage and database.

Copilot · 2025-12-01T01:55:22Z

backend/app/api/endpoints/documents.py

+        query = """
+            SELECT id, thread_id, uploader_id, file_name, file_type,
+                   file_size, storage_ref, indexed, permission,
+                   created_at, updated_at, metadata
+            FROM documents
+            WHERE 1=1
+        """
+        count_query = "SELECT COUNT(*) as total FROM documents WHERE 1=1"
+        params = []
+
+        if thread_id:
+            try:
+                UUID(thread_id)
+            except ValueError:
+                raise HTTPException(
+                    status_code=status.HTTP_400_BAD_REQUEST,
+                    detail="Invalid thread_id format",
+                )
+            query += " AND thread_id = %s"
+            count_query += " AND thread_id = %s"
+            params.append(thread_id)
+
+        if uploader_id:
+            try:
+                UUID(uploader_id)
+            except ValueError:
+                raise HTTPException(
+                    status_code=status.HTTP_400_BAD_REQUEST,
+                    detail="Invalid uploader_id format",
+                )
+            query += " AND uploader_id = %s"
+            count_query += " AND uploader_id = %s"
+            params.append(uploader_id)
+
+        # Add pagination
+        offset = (page - 1) * per_page
+        query += " ORDER BY created_at DESC LIMIT %s OFFSET %s"
+
+        # Get total count
+        total_result = execute_query(
+            count_query, tuple(params) if params else None, fetch_one=True
+        )
+        total = total_result["total"] if total_result else 0
+
+        # Get documents
+        params.extend([per_page, offset])
+        documents = execute_query(query, tuple(params))


Missing database indexes for frequently queried columns. The list_documents endpoint filters by thread_id and uploader_id and orders by created_at. Without proper indexes, these queries will perform full table scans as the documents table grows. Ensure the database schema includes indexes on:

thread_id

uploader_id

created_at (for sorting)

Composite index on (thread_id, created_at) and (uploader_id, created_at) for optimal query performance

Copilot · 2025-12-01T01:55:23Z

frontend/src/queries/useDocument.js

@@ -0,0 +1,112 @@
+import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";


Unused imports useMutation, useQueryClient.

Suggested change

import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";

import { useQuery } from "@tanstack/react-query";

Copilot · 2025-12-01T01:55:23Z

frontend/src/queries/useDocument.js

+  uploadDocument,
+  getDocument,
+  getDocuments,
+  getDocumentDownloadUrl,
+  deleteDocument,


Unused imports deleteDocument, uploadDocument.

Suggested change

uploadDocument,

getDocument,

getDocuments,

getDocumentDownloadUrl,

deleteDocument,

getDocument,

getDocuments,

getDocumentDownloadUrl,

Noname397 added 2 commits November 26, 2025 22:44

first draft of file upload (backend)

2b084a2

first draft of file upload (frontend)

a1f814e

Copilot AI review requested due to automatic review settings December 1, 2025 01:21

Noname397 requested review from TonyLiu0226 and hamin2006 as code owners December 1, 2025 01:21

Copilot started reviewing on behalf of Noname397 December 1, 2025 01:21 View session

github-advanced-security bot found potential problems Dec 1, 2025

View reviewed changes

backend/requirements.txt Fixed Show fixed Hide fixed

backend/requirements.txt Fixed Show fixed Hide fixed

updated requirements.txt

a3bfbfe

Noname397 self-assigned this Dec 1, 2025

github-advanced-security bot found potential problems Dec 1, 2025

View reviewed changes

backend/requirements.txt Fixed Show fixed Hide fixed

Copilot finished reviewing on behalf of Noname397 December 1, 2025 01:27

Noname397 added 3 commits November 30, 2025 17:29

backend linting

df6f447

formated backend part 2

b18675a

fixed requirements.txt part 2

9ab3165

Copilot AI reviewed Dec 1, 2025

View reviewed changes

Noname397 requested a review from Copilot December 1, 2025 01:46

Copilot started reviewing on behalf of Noname397 December 1, 2025 01:46 View session

Copilot finished reviewing on behalf of Noname397 December 1, 2025 01:51

Copilot AI reviewed Dec 1, 2025

View reviewed changes

	"upsert": "false", # Don't overwrite existing files
	"upsert": False, # Don't overwrite existing files



		@router.get("/{document_id}/download")
		async def get_document_download_url(document_id: str, expires_in: int = 3600):

		@@ -0,0 +1,112 @@
		import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";

	import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";
	import { useQuery } from "@tanstack/react-query";

Feature/file upload integration #62

Are you sure you want to change the base?

Feature/file upload integration #62

Uh oh!

Conversation

Noname397 commented Dec 1, 2025 • edited by seanxjin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

🎯 Type of Change

🧪 Testing

📋 Checklist

📸 Screenshots (if applicable)

🔗 Related Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Noname397 commented Dec 1, 2025 •

edited by seanxjin

Loading