Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
5cc0889
✨ Refactor item management to document management in API and models
monicasmith463 Jul 18, 2025
540461c
✨ Update document model and add S3 upload functionality
monicasmith463 Jul 25, 2025
f41dfd2
✨ Add S3 file upload functionality and update document creation process
monicasmith463 Jul 28, 2025
e9e87af
✨ Add Python version file and expose PostgreSQL port in Docker Compose
monicasmith463 Jul 28, 2025
fea3bda
skip using alembic for now
monicasmith463 Jul 28, 2025
f084486
✨ Remove item management API routes and associated tests
monicasmith463 Jul 28, 2025
24bb157
✨ Remove obsolete Alembic migration scripts for cascade delete relati…
monicasmith463 Jul 28, 2025
8fb03d6
✨ Add .env to .gitignore to prevent sensitive information from being …
monicasmith463 Jul 28, 2025
26bd7dd
✨ Add text extraction functionality from S3 files and save to database
monicasmith463 Jul 28, 2025
ebb8965
✨ Add extracted_text field to Document model and implement text extra…
monicasmith463 Jul 29, 2025
a20ace4
✨ Add text extraction dependencies and integrate extraction in docume…
monicasmith463 Jul 29, 2025
a140f2e
Merge branch 'master' of https://github.com/monicasmith463/study-assi…
monicasmith463 Aug 4, 2025
cd7d024
don't do alembic migrations for now
monicasmith463 Aug 4, 2025
bcdb323
run ruff to fix linter build
monicasmith463 Aug 4, 2025
bcd6c96
remove add-to-project job from the build
monicasmith463 Aug 4, 2025
30a4777
Merge pull request #1 from monicasmith463/infra/remove-alembic-migration
monicasmith463 Aug 4, 2025
b53f90f
undo delete backend/app/alembic/versions/
monicasmith463 Aug 4, 2025
9012074
Merge branch 'master' of https://github.com/monicasmith463/study-assi…
monicasmith463 Aug 4, 2025
8040455
✨ Autogenerate frontend client
invalid-email-address Aug 4, 2025
59b3d11
fix lint err in model DocumentBase make fieldds optional
monicasmith463 Aug 4, 2025
9509aeb
add boto3-stubs
monicasmith463 Aug 5, 2025
adc5119
convert UUID to a string in documents.py
monicasmith463 Aug 5, 2025
d8efa80
fix return type annotation
monicasmith463 Aug 5, 2025
b7bbb80
Merge branch 'feature/file-upload' of https://github.com/monicasmith4…
monicasmith463 Aug 5, 2025
7b21986
removed unused type ignore
monicasmith463 Aug 5, 2025
97fd600
Update backend/app/api/routes/documents.py
monicasmith463 Aug 5, 2025
3bcc444
fix: remove local development path from requirements.txt
devloai[bot] Aug 5, 2025
10ae893
fix: configure S3 client with proper AWS credentials from settings
devloai[bot] Aug 5, 2025
3405175
Update backend/app/s3.py
monicasmith463 Aug 5, 2025
5864e66
remove requirements.txt and python-version
monicasmith463 Aug 5, 2025
59b588d
merge
monicasmith463 Aug 5, 2025
a2a920f
fix lint errors
monicasmith463 Aug 5, 2025
5e6b6a2
Delete backend/requirements.txt
monicasmith463 Aug 5, 2025
ed1ce65
Delete requirements.txt
monicasmith463 Aug 5, 2025
678e393
Update .env
monicasmith463 Aug 5, 2025
f66e128
add new AWS secrets for test workflow
monicasmith463 Aug 5, 2025
8ab5719
update deploy-staging
monicasmith463 Aug 5, 2025
344ce94
update docker-compose
monicasmith463 Aug 5, 2025
62c48e8
add also to backend env
monicasmith463 Aug 5, 2025
9dd39d5
update env secrets in the test-backend.yml
monicasmith463 Aug 5, 2025
f57686b
update generate-client.yml, playwright.yml, test-docker-compose.yml
monicasmith463 Aug 5, 2025
270e13a
✨ Autogenerate frontend client
invalid-email-address Aug 5, 2025
4118279
update playwirght.yml again
monicasmith463 Aug 5, 2025
2e32d50
Merge branch 'feature/file-upload' of https://github.com/monicasmith4…
monicasmith463 Aug 5, 2025
aaef247
add Item back into models.py for now
monicasmith463 Aug 5, 2025
5c6cfbb
make Item model inactive
monicasmith463 Aug 5, 2025
4c05f36
revert changes to model
monicasmith463 Aug 5, 2025
eb4ac0a
safe filename
monicasmith463 Aug 5, 2025
0b107e2
run ruff lint fixes
monicasmith463 Aug 5, 2025
28e3662
ruff format app
monicasmith463 Aug 5, 2025
54925db
Merge pull request #3 from monicasmith463/feature/file-upload
monicasmith463 Aug 5, 2025
6607321
merge master
monicasmith463 Aug 5, 2025
7f943ac
Update pyproject.toml
monicasmith463 Aug 5, 2025
96e865d
fix select statement in extractors for lint error
monicasmith463 Aug 5, 2025
3dfe720
fix more lint errors
monicasmith463 Aug 5, 2025
400591f
fix extract test:
monicasmith463 Aug 5, 2025
410e6b9
refactor extractors to use S3 text extraction function and update dep…
monicasmith463 Aug 6, 2025
ef1aed2
remove AddItem component from Items layout
monicasmith463 Aug 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 0 additions & 18 deletions .github/workflows/add-to-project.yml

This file was deleted.

4 changes: 4 additions & 0 deletions .github/workflows/deploy-production.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ jobs:
SMTP_PASSWORD: ${{ secrets.SMTP_PASSWORD }}
EMAILS_FROM_EMAIL: ${{ secrets.EMAILS_FROM_EMAIL }}
POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
S3_BUCKET: ${{ secrets.S3_BUCKET }}
SENTRY_DSN: ${{ secrets.SENTRY_DSN }}
steps:
- name: Checkout
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/deploy-staging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ jobs:
SMTP_PASSWORD: ${{ secrets.SMTP_PASSWORD }}
EMAILS_FROM_EMAIL: ${{ secrets.EMAILS_FROM_EMAIL }}
POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
S3_BUCKET: ${{ secrets.S3_BUCKET }}
SENTRY_DSN: ${{ secrets.SENTRY_DSN }}
steps:
- name: Checkout
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/generate-client.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ jobs:
permissions:
contents: write
runs-on: ubuntu-latest
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
S3_BUCKET: ${{ secrets.S3_BUCKET }}
steps:
# For PRs from forks
- uses: actions/checkout@v4
Expand Down
20 changes: 20 additions & 0 deletions .github/workflows/playwright.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ on:
jobs:
changes:
runs-on: ubuntu-latest
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
S3_BUCKET: ${{ secrets.S3_BUCKET }}
# Set job outputs to values from filter step
outputs:
changed: ${{ steps.filter.outputs.changed }}
Expand All @@ -41,6 +46,11 @@ jobs:
if: ${{ needs.changes.outputs.changed == 'true' }}
timeout-minutes: 60
runs-on: ubuntu-latest
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
S3_BUCKET: ${{ secrets.S3_BUCKET }}
strategy:
matrix:
shardIndex: [1, 2, 3, 4]
Expand Down Expand Up @@ -92,6 +102,11 @@ jobs:
# Merge reports after playwright-tests, even if some shards have failed
if: ${{ !cancelled() && needs.changes.outputs.changed == 'true' }}
runs-on: ubuntu-latest
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
S3_BUCKET: ${{ secrets.S3_BUCKET }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
Expand Down Expand Up @@ -123,6 +138,11 @@ jobs:
needs:
- test-playwright
runs-on: ubuntu-latest
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
S3_BUCKET: ${{ secrets.S3_BUCKET }}
steps:
- name: Decide whether the needed jobs succeeded or failed
uses: re-actors/alls-green@release/v1
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/test-backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ on:
jobs:
test-backend:
runs-on: ubuntu-latest
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
S3_BUCKET: ${{ secrets.S3_BUCKET }}
steps:
- name: Checkout
uses: actions/checkout@v4
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/test-docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ jobs:

test-docker-compose:
runs-on: ubuntu-latest
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
S3_BUCKET: ${{ secrets.S3_BUCKET }}
steps:
- name: Checkout
uses: actions/checkout@v4
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ node_modules/
/playwright-report/
/blob-report/
/playwright/.cache/
.env
4 changes: 2 additions & 2 deletions backend/app/api/main.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
from fastapi import APIRouter

from app.api.routes import items, login, private, users, utils
from app.api.routes import documents, login, private, users, utils
from app.core.config import settings

api_router = APIRouter()
api_router.include_router(login.router)
api_router.include_router(users.router)
api_router.include_router(utils.router)
api_router.include_router(items.router)
api_router.include_router(documents.router)


if settings.ENVIRONMENT == "local":
Expand Down
50 changes: 50 additions & 0 deletions backend/app/api/routes/documents.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
from typing import Any

from fastapi import APIRouter, BackgroundTasks, File, HTTPException, UploadFile

from app.api.deps import CurrentUser, SessionDep
from app.core.extractors import extract_text_and_save_to_db
from app.models import Document, DocumentCreate, DocumentPublic
from app.s3 import generate_s3_url, upload_file_to_s3

router = APIRouter(prefix="/documents", tags=["documents"])


@router.post("/", response_model=DocumentPublic)
def create_document(
*,
session: SessionDep,
current_user: CurrentUser,
background_tasks: BackgroundTasks, # noqa: ARG001
file: UploadFile = File(...),
) -> Any:
key = None
try:
key = upload_file_to_s3(file, str(current_user.id))
except Exception as e:
raise HTTPException(500, f"Failed to upload file. Error: {str(e)}")

try:
url = generate_s3_url(key)
except Exception:
raise HTTPException(500, f"Could not generate URL for file key: {key}")

document_in = DocumentCreate(
filename=file.filename,
content_type=file.content_type,
size=file.size,
s3_url=url,
)

document = Document.model_validate(
document_in, update={"owner_id": current_user.id}
)

session.add(document)
session.commit()
session.refresh(document)

# 3. Kick off background job
print("Document created, starting background task...")
background_tasks.add_task(extract_text_and_save_to_db, url, str(document.id))
return document
109 changes: 0 additions & 109 deletions backend/app/api/routes/items.py

This file was deleted.

4 changes: 2 additions & 2 deletions backend/app/api/routes/users.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from app.core.config import settings
from app.core.security import get_password_hash, verify_password
from app.models import (
Item,
Document,
Message,
UpdatePassword,
User,
Expand Down Expand Up @@ -219,7 +219,7 @@ def delete_user(
raise HTTPException(
status_code=403, detail="Super users are not allowed to delete themselves"
)
statement = delete(Item).where(col(Item.owner_id) == user_id)
statement = delete(Document).where(col(Document.owner_id) == user_id)
session.exec(statement) # type: ignore
session.delete(user)
session.commit()
Expand Down
5 changes: 5 additions & 0 deletions backend/app/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@ def all_cors_origins(self) -> list[str]:
POSTGRES_PASSWORD: str = ""
POSTGRES_DB: str = ""

AWS_ACCESS_KEY_ID: str = ""
AWS_SECRET_ACCESS_KEY: str = ""
AWS_REGION: str = ""
S3_BUCKET_NAME: str = ""

@computed_field # type: ignore[prop-decorator]
@property
def SQLALCHEMY_DATABASE_URI(self) -> PostgresDsn:
Expand Down
5 changes: 2 additions & 3 deletions backend/app/core/db.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from sqlmodel import Session, create_engine, select
from sqlmodel import Session, SQLModel, create_engine, select

from app import crud
from app.core.config import settings
Expand All @@ -16,10 +16,9 @@ def init_db(session: Session) -> None:
# Tables should be created with Alembic migrations
# But if you don't want to use migrations, create
# the tables un-commenting the next lines
# from sqlmodel import SQLModel

# This works because the models are already imported and registered from app.models
# SQLModel.metadata.create_all(engine)
SQLModel.metadata.create_all(engine)

user = session.exec(
select(User).where(User.email == settings.FIRST_SUPERUSER)
Expand Down
27 changes: 27 additions & 0 deletions backend/app/core/extractors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import os
import tempfile
from app.core.db import engine
from sqlmodel import Session, select
import textract
import requests
from app.models import Document
from app.s3 import extract_text_from_s3_file

def extract_text_and_save_to_db(s3_url: str, document_id: str) -> None:
try:
with Session(engine) as session:
text = extract_text_from_s3_file(s3_url)

document_query = select(Document).where(Document.id == document_id)
document = session.exec(document_query).first()

if not document:
raise Exception(f"Document with ID {document_id} not found")

document.extracted_text = text
session.add(document)
session.commit()

except Exception as e:
print(f"Failed to extract and chunk text for document {document_id}: {e}")

Loading
Loading