Skip to content

Conversation

@Noname397
Copy link
Collaborator

@Noname397 Noname397 commented Dec 1, 2025

📝 Description

This pull request introduces a new document upload and storage feature integrated with Supabase, along with related backend and frontend dependency updates. The main changes include new document models, Supabase storage service utilities, API endpoint registration for documents, configuration additions for file uploads, and required package updates.

Backend: Document Upload & Storage Integration

  • Added document models in models/document.py to support file upload, metadata, permissions, and responses for document operations.
  • Implemented Supabase storage service in services/storage.py for uploading, deleting, and generating URLs for files, including validation helpers for file type and size.
  • Registered the new document API endpoints in backend routing (api/endpoints/__init__.py, api/routes.py) and exported document models in models/__init__.py.
  • Added configuration options in core/config.py for Supabase credentials, storage bucket, max file size, and allowed file types.

Backend: Dependency Updates

  • Enabled required backend packages for Supabase and file handling: supabase, asyncpg, sqlalchemy[asyncio], python-multipart, and aiofiles in requirements.txt.

Frontend: Dependency Updates

  • Added @tanstack/react-query for data fetching and updated uuid to version 13.0.0 for unique identifier generation in package.json and package-lock.json.

Test instruction

  • Generate two valid UUIDs (mock user_id + thread_id).
  • Insert them into Supabase tables with simple INSERT SQL.
  • Hardcode these UUIDs in AssistantPage.jsx.
  • Call your endpoint using those IDs and check Supabase for expected changes.

🎯 Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • 🔨 Refactoring (no functional changes)
  • 🧪 Tests (adding or updating tests)
  • 🔧 Chore (dependency updates, config changes, etc.)

🧪 Testing

  • I have tested this change locally
  • [] I have added/updated tests for this change
  • [] All existing tests pass

📋 Checklist

  • My code follows the code style of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

📸 Screenshots (if applicable)

N/A

🔗 Related Issues

Copilot AI review requested due to automatic review settings December 1, 2025 01:21
@Noname397 Noname397 self-assigned this Dec 1, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 28 comments.

Files not reviewed (1)
  • frontend/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

file=file_content,
file_options={
"content-type": file_type,
"upsert": "false", # Don't overwrite existing files
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upsert option is set to "false" (string), but it should be a boolean value False. Check the Supabase Python SDK documentation to verify the correct type, as strings may not be properly interpreted.

Suggested change
"upsert": "false", # Don't overwrite existing files
"upsert": False, # Don't overwrite existing files

Copilot uses AI. Check for mistakes.


@router.get("/{document_id}/download")
async def get_document_download_url(document_id: str, expires_in: int = 3600):
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation for expires_in parameter. Negative or excessively large values could cause issues. Add validation to ensure it's within a reasonable range:

if expires_in <= 0 or expires_in > 86400:  # Max 24 hours
    raise HTTPException(
        status_code=status.HTTP_400_BAD_REQUEST,
        detail="expires_in must be between 1 and 86400 seconds (24 hours)"
    )

Copilot uses AI. Check for mistakes.
Comment on lines +341 to +351
# Delete from Supabase Storage
try:
await delete_file(document["storage_ref"])
except StorageError as e:
logger.warning(
f"Failed to delete from storage (continuing with DB delete): {e}"
)

# Delete from database
delete_query = "DELETE FROM documents WHERE id = %s"
affected_rows = execute_statement(delete_query, (str(doc_uuid),))
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If storage deletion fails but database deletion succeeds, the document metadata is removed but the file remains in storage, creating an orphaned file. Consider either:

  1. Failing the entire operation if storage deletion fails
  2. Implementing a background cleanup job to remove orphaned files
  3. Logging orphaned files for manual cleanup

The current approach (continuing after storage failure) creates data inconsistency.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +111
if (downloadData?.download_url && selectedDocumentId) {
window.open(downloadData.download_url, "_blank");
setSelectedDocumentId(null); // Reset after opening
}
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation causes the download URL to open in a new tab on every render when downloadData and selectedDocumentId are truthy. This should be inside a useEffect hook to prevent it from executing on every render cycle.

useEffect(() => {
  if (downloadData?.download_url && selectedDocumentId) {
    window.open(downloadData.download_url, "_blank");
    setSelectedDocumentId(null);
  }
}, [downloadData, selectedDocumentId]);
Suggested change
if (downloadData?.download_url && selectedDocumentId) {
window.open(downloadData.download_url, "_blank");
setSelectedDocumentId(null); // Reset after opening
}
import { useEffect } from "react";
useEffect(() => {
if (downloadData?.download_url && selectedDocumentId) {
window.open(downloadData.download_url, "_blank");
setSelectedDocumentId(null); // Reset after opening
}
}, [downloadData, selectedDocumentId]);

Copilot uses AI. Check for mistakes.
Comment on lines +76 to +85
class DocumentUploadRequest(BaseModel):
"""Request model for file upload endpoint."""

thread_id: UUID = Field(..., description="Thread to attach document to")
permission: PermissionEnum = Field(
default=PermissionEnum.PRIVATE, description="Access permission level"
)
metadata: Optional[Dict[str, Any]] = Field(
default=None, description="Additional metadata"
)
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DocumentUploadRequest model is defined but never used in the endpoints. The upload endpoint uses Form parameters directly instead of utilizing this Pydantic model. Consider either using the model for validation or removing it to reduce code clutter.

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +94
alert("Files uploaded successfully!");
refetchDocuments();
} catch (error) {
console.error("Error uploading files:", error);
alert("Error uploading files. Please try again.");
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using alert() for error messages is not user-friendly and doesn't follow modern UI patterns. Consider using a proper toast notification system or error state display component instead.

Copilot uses AI. Check for mistakes.
Comment on lines +141 to +151
if not result:
# Rollback storage upload if database insert fails
try:
await delete_file(storage_ref)
except StorageError:
logger.error(f"Failed to rollback storage upload: {storage_ref}")

raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Failed to save document metadata",
)
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition: if the file is successfully uploaded to storage but the database insert fails, the rollback attempts to delete the file. However, if the deletion also fails, the file remains orphaned in storage with no database record. Consider implementing a cleanup job or improving the error handling to ensure consistency between storage and database.

Copilot uses AI. Check for mistakes.
Comment on lines +399 to +445
query = """
SELECT id, thread_id, uploader_id, file_name, file_type,
file_size, storage_ref, indexed, permission,
created_at, updated_at, metadata
FROM documents
WHERE 1=1
"""
count_query = "SELECT COUNT(*) as total FROM documents WHERE 1=1"
params = []

if thread_id:
try:
UUID(thread_id)
except ValueError:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Invalid thread_id format",
)
query += " AND thread_id = %s"
count_query += " AND thread_id = %s"
params.append(thread_id)

if uploader_id:
try:
UUID(uploader_id)
except ValueError:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Invalid uploader_id format",
)
query += " AND uploader_id = %s"
count_query += " AND uploader_id = %s"
params.append(uploader_id)

# Add pagination
offset = (page - 1) * per_page
query += " ORDER BY created_at DESC LIMIT %s OFFSET %s"

# Get total count
total_result = execute_query(
count_query, tuple(params) if params else None, fetch_one=True
)
total = total_result["total"] if total_result else 0

# Get documents
params.extend([per_page, offset])
documents = execute_query(query, tuple(params))
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing database indexes for frequently queried columns. The list_documents endpoint filters by thread_id and uploader_id and orders by created_at. Without proper indexes, these queries will perform full table scans as the documents table grows. Ensure the database schema includes indexes on:

  • thread_id
  • uploader_id
  • created_at (for sorting)
  • Composite index on (thread_id, created_at) and (uploader_id, created_at) for optimal query performance

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,112 @@
import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused imports useMutation, useQueryClient.

Suggested change
import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";
import { useQuery } from "@tanstack/react-query";

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +7
uploadDocument,
getDocument,
getDocuments,
getDocumentDownloadUrl,
deleteDocument,
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused imports deleteDocument, uploadDocument.

Suggested change
uploadDocument,
getDocument,
getDocuments,
getDocumentDownloadUrl,
deleteDocument,
getDocument,
getDocuments,
getDocumentDownloadUrl,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants