Skip to content

Commit 051a974

Browse files
mrjasonroyCopilot
andauthored
feat: s3 storage and richer file support (cgoinglove#301)
## Summary - Stand up a production-ready S3 storage driver: presigned PUT upload flow, head/tail download helpers, CLI smoke check, new env surface area, and doc updates for teams migrating off Vercel Blob. - Rebuild the upload UX into a thread-scoped, multi-file pipeline with drag-and-drop overlay, retry-aware queueing, per-file progress, and consistent metadata fan- out for chat state, clipboard, and API payloads. - Normalize attachment rendering so user vs. assistant bubbles style correctly, filenames truncate, filetype badges surface, and download affordances remain obvious —even when multiple cards stack together. - Tighten the file-handling contract: only image/PDF MIME types flow through as file parts; everything else gracefully downgrades to a source-url card so models never receive unsupported media. CSV is special-cased with a hidden markdown preview that the model consumes while the UI stays clean. - Advertise per-model MIME capabilities through customModelProvider, letting the composer warn early and making unsupported models fall back without breaking tool calls. - Auto-ingest CSV attachments by streaming server-side previews, plugging into chat history so agents can immediately cite tabular context. Added coverage for preview formatting and ingestion gating. - Document the new storage story (docs/storage/s3-setup.md) and codex agent metadata (AGENTS.md), plus extend unit tests across storage utils, ingestion, and MIME support. ### File Handling Model - Provider MIME gates: The client now trusts explicit allowlists attached to each model. If a provider (e.g., OpenAI) cannot ingest a media type, we flip the part to source-url so the LLM sees a link instead of throwing “functionality not supported.” This keeps uploads future-proof and highlights the narrow set of formats we genuinely support today (images + PDF). - CSV ingestion preview: CSV files render an ingestionPreview text part that the UI suppresses but the model consumes, giving instant structured context without polluting the chat transcript. - Threaded uploader: Each thread maintains its own queue, enabling parallel uploads, progress tracking, and safe retry semantics. Drag-and-drop, button uploads, and clipboard pastes all flow through a single hook so we don’t double-handle state. - UI parity: File bubbles respect author context, maintain readability with truncation and badges, and keep download buttons consistent even when stacking multiple attachments. ## Screenshots / Recordings ### Drag and Drop https://github.com/user-attachments/assets/1346b0e7-fa86-4a8f-abee-bd69542c93b9 ### Multi File Drag and Drop https://github.com/user-attachments/assets/d3e3bd27-6cee-4ac4-86f6-b08bc1ce5973 ### CSV Preview https://github.com/user-attachments/assets/9382005f-ea50-4377-9e15-34d3bb88c7fc ## Configuration Notes - .env.example now documents FILE_STORAGE_TYPE (vercel-blob or s3), FILE_STORAGE_S3_BUCKET, region, optional CDN origin, and other S3 knobs. - S3 driver uses AWS credentials from env or the default provider chain; fallback instructions live in docs/storage/s3-setup.md. - CSV ingestion continues to leverage /api/storage/ingest; ensure your storage backend serves direct downloads for the preview step. ## Verification Guide 1. pnpm dev 2. Upload a mix of image + CSV + PDF via the picker and via drag-and-drop. Confirm badges, truncation, download button, and per-thread queues. 3. Switch to a model that doesn’t support file parts; ensure attachments render as “source-url” cards instead of failing tool calls. 4. Send a CSV attachment and validate the assistant references the preview immediately with no visible summary text in the chat. 5. (Optional) Point to S3 credentials, generate a presigned PUT (CLI script provided), upload through the UI, and confirm the file is present via S3 console or aws s3 head object. ## Tests - pnpm format - pnpm lint - pnpm check-types - pnpm test ## Documentation & Follow-up - New S3 setup guide at docs/storage/s3-setup.md. - Added AGENTS.md for Codex agent metadata. - Future: add XLSX ingestion once a server-side parser/vectorization pipeline lands; capture upload telemetry once S3 analytics hooks are in place. --------- Co-authored-by: Copilot <[email protected]>
1 parent 4513ac0 commit 051a974

38 files changed

+1855
-205
lines changed

.env.example

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,8 +102,25 @@ MCP_MAX_TOTAL_TIMEOUT=
102102
# BLOB_READ_WRITE_TOKEN=
103103

104104

105-
# -- S3 (planned driver) --
105+
# -- S3 --
106106
# FILE_STORAGE_TYPE=s3
107107
# FILE_STORAGE_PREFIX=uploads
108108
# FILE_STORAGE_S3_BUCKET=
109109
# FILE_STORAGE_S3_REGION=
110+
# Optional: Use when serving files via CDN/custom domain
111+
# FILE_STORAGE_S3_PUBLIC_BASE_URL=https://cdn.example.com
112+
# Optional: For S3-compatible endpoints (e.g., MinIO)
113+
# FILE_STORAGE_S3_ENDPOINT=http://localhost:9000
114+
# Optional: Force path-style URLs (1/true to enable)
115+
# FILE_STORAGE_S3_FORCE_PATH_STYLE=1
116+
117+
118+
# AWS Credentials (server only)
119+
# The AWS SDK automatically discovers credentials in this order:
120+
# 1) Environment variables below, 2) ~/.aws/credentials or AWS_PROFILE,
121+
# 3) IAM role attached to the runtime (EC2/ECS/EKS/Lambda).
122+
# You do NOT need to set these when using an IAM role.
123+
# AWS_ACCESS_KEY_ID=
124+
# AWS_SECRET_ACCESS_KEY=
125+
# AWS_SESSION_TOKEN=
126+
# AWS_REGION=us-east-1

AGENTS.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Repository Guidelines
2+
3+
## Project Structure & Module Organization
4+
- App code lives in `src`.
5+
- `src/app` (Next.js routes, API, middleware)
6+
- `src/components` (UI; reusable components in PascalCase)
7+
- `src/lib` (helpers: auth, db, ai, validations, etc.)
8+
- `src/hooks` (React hooks: `useX`)
9+
- Assets in `public/`. End‑to‑end tests in `tests/`. Scripts in `scripts/`. Docker files in `docker/`.
10+
11+
## Build, Test, and Development Commands
12+
- `pnpm dev` — Run the app locally (Next.js dev server).
13+
- `pnpm build` / `pnpm start` — Production build and run.
14+
- `pnpm lint` / `pnpm lint:fix` — ESLint + Biome checks and autofix.
15+
- `pnpm format` — Format with Biome.
16+
- `pnpm test` / `pnpm test:watch` — Unit tests (Vitest).
17+
- `pnpm test:e2e` — Playwright tests; uses `playwright.config.ts` webServer.
18+
- DB: `pnpm db:push`, `pnpm db:studio`, `pnpm db:migrate` (Drizzle Kit).
19+
- Docker: `pnpm docker-compose:up` / `:down` to run local stack.
20+
21+
## Coding Style & Naming Conventions
22+
- TypeScript everywhere. Prefer `zod` for validation.
23+
- Formatting via Biome: 2 spaces, LF, width 80, double quotes.
24+
- Components: `PascalCase.tsx`; hooks/utilities: `camelCase.ts`.
25+
- Co-locate small module tests next to code; larger suites under `tests/`.
26+
- Keep modules focused; avoid circular deps; use `src/lib` for shared logic.
27+
28+
## Testing Guidelines
29+
- Unit tests: Vitest, filename `*.test.ts(x)`.
30+
- E2E: Playwright under `tests/`, filename `*.spec.ts`.
31+
- Run locally: `pnpm test` and `pnpm test:e2e` (ensure app is running or let Playwright start via config).
32+
- Add tests for new features and bug fixes; cover happy path + one failure mode.
33+
34+
## Commit & Pull Request Guidelines
35+
- Conventional Commits: `feat:`, `fix:`, `chore:`, `docs:`, etc. Example: `feat: add image generation tool`.
36+
- Branch names: `feat/…`, `fix/…`, `chore/…`.
37+
- PRs: clear description, linked issues, screenshots or terminal output when UI/CLI changes; list test coverage and manual steps.
38+
- Before opening PR: `pnpm check` (lint+types+tests) should pass.
39+
40+
## Security & Configuration Tips
41+
- Copy `.env.example` to `.env`; never commit secrets. For local HTTP use `NO_HTTPS=1` or `pnpm build:local`.
42+
- If using DB/Redis locally, start services via Docker scripts or your own stack.

docs/storage/s3-setup.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# S3 Storage Setup
2+
3+
This app supports S3 for file uploads (dev/prod). Development can rely on presigned PUTs directly from the browser, while production should keep the bucket private and serve via CDN (CloudFront + Origin Access Control) or signed GET URLs.
4+
5+
## Buckets
6+
- Pick a region (e.g., `us-east-2`)
7+
- Dev/Test example: `better-chatbot-dev` (public GET on `uploads/` only if needed)
8+
- Prod example: `better-chatbot-prod` (private)
9+
- Enable default encryption (SSE-S3) and versioning on both buckets.
10+
11+
## CORS
12+
- Dev bucket: allow PUT/GET/HEAD from the origins you use locally and in staging, for example:
13+
- `http://localhost:3000`, `http://127.0.0.1:3000`
14+
- `https://staging.your-domain.com`, `http://staging.your-domain.com`
15+
- Prod bucket: allow GET/HEAD only from your production domain (e.g., `https://app.your-domain.com`). Avoid enabling browser PUT in production.
16+
17+
## Dev public-read policy (prefix-only)
18+
Grant public GET for the `uploads/` prefix on the dev bucket only if you need unauthenticated downloads:
19+
```
20+
{
21+
"Version": "2012-10-17",
22+
"Statement": [
23+
{
24+
"Sid": "AllowPublicReadForUploadsPrefix",
25+
"Effect": "Allow",
26+
"Principal": "*",
27+
"Action": "s3:GetObject",
28+
"Resource": "arn:aws:s3:::better-chatbot-dev/uploads/*"
29+
}
30+
]
31+
}
32+
```
33+
34+
## IAM (app runtime)
35+
Least privilege for app role/user:
36+
- Actions: `s3:PutObject`, `s3:GetObject`, `s3:DeleteObject`, `s3:HeadObject`
37+
- Resources: `arn:aws:s3:::<bucket-name>/uploads/*`
38+
39+
## Env configuration
40+
- Dev/local:
41+
- `FILE_STORAGE_TYPE=s3`
42+
- `FILE_STORAGE_PREFIX=uploads`
43+
- `FILE_STORAGE_S3_BUCKET=better-chatbot-dev`
44+
- `FILE_STORAGE_S3_REGION=us-east-2` (or set `AWS_REGION`)
45+
- Use AWS SSO/profile or `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`
46+
- Prod:
47+
- `FILE_STORAGE_S3_BUCKET=better-chatbot-prod`
48+
- Prefer CloudFront; set `FILE_STORAGE_S3_PUBLIC_BASE_URL=https://<cdn-domain>`
49+
50+
## Verify locally
51+
- Ensure `aws sso login --profile <your_profile>` (or credentials are already available).
52+
- Test presign script:
53+
```
54+
AWS_PROFILE=<your_profile> \
55+
FILE_STORAGE_TYPE=s3 \
56+
FILE_STORAGE_S3_BUCKET=better-chatbot-dev \
57+
FILE_STORAGE_S3_REGION=us-east-2 \
58+
pnpm tsx scripts/verify-s3-upload-url.ts
59+
```
60+
- You should get `{ directUploadSupported: true, url, key, method: PUT }`.
61+
- Upload with curl (optional): `curl -X PUT -H "Content-Type: image/png" --data-binary @file.png "<url>"`.

messages/en.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@
194194
"Chat": {
195195
"Error": "Chat Error",
196196
"thisMessageWasNotSavedPleaseTryTheChatAgain": "This message was not saved. Please try the chat again.",
197-
"uploadImage": "Upload Image",
197+
"uploadImage": "Upload File",
198198
"generateImage": "Generate Image",
199199
"imageUploadedSuccessfully": "Image uploaded successfully",
200200
"pleaseUploadImageFile": "Please upload an image file",

messages/es.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
"Chat": {
7575
"Error": "Error de Chat",
7676
"thisMessageWasNotSavedPleaseTryTheChatAgain": "Este mensaje no se guardó. Por favor, intenta el chat nuevamente.",
77-
"uploadImage": "Subir Imagen",
77+
"uploadImage": "Subir archivo",
7878
"generateImage": "Generar Imagen",
7979
"imageUploadedSuccessfully": "Imagen subida exitosamente",
8080
"pleaseUploadImageFile": "Por favor sube un archivo de imagen",

messages/fr.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
"Chat": {
7575
"Error": "Erreur de Chat",
7676
"thisMessageWasNotSavedPleaseTryTheChatAgain": "Ce message n'a pas été enregistré. Veuillez réessayer le chat.",
77-
"uploadImage": "Télécharger une Image",
77+
"uploadImage": "Téléverser un fichier",
7878
"generateImage": "Générer une Image",
7979
"imageUploadedSuccessfully": "Image téléchargée avec succès",
8080
"pleaseUploadImageFile": "Veuillez télécharger un fichier image",

messages/ja.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
"Chat": {
7575
"Error": "チャットエラー",
7676
"thisMessageWasNotSavedPleaseTryTheChatAgain": "このメッセージは保存されませんでした。もう一度チャットをお試しください。",
77-
"uploadImage": "画像をアップロード",
77+
"uploadImage": "ファイルをアップロード",
7878
"generateImage": "画像を生成",
7979
"imageUploadedSuccessfully": "画像が正常にアップロードされました",
8080
"pleaseUploadImageFile": "画像ファイルをアップロードしてください",

messages/ko.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@
7575
"Chat": {
7676
"Error": "채팅 오류",
7777
"thisMessageWasNotSavedPleaseTryTheChatAgain": "이 메시지는 저장되지 않았습니다. 다시 시도해주세요.",
78-
"uploadImage": "이미지 업로드",
78+
"uploadImage": "파일 업로드",
7979
"generateImage": "이미지 만들기",
8080
"imageUploadedSuccessfully": "이미지가 성공적으로 업로드되었습니다",
8181
"pleaseUploadImageFile": "이미지 파일을 업로드해주세요",

messages/zh.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@
7575
"Chat": {
7676
"Error": "聊天错误",
7777
"thisMessageWasNotSavedPleaseTryTheChatAgain": "此消息未保存。请重试聊天。",
78-
"uploadImage": "上传图片",
78+
"uploadImage": "上传文件",
7979
"generateImage": "生成图片",
8080
"imageUploadedSuccessfully": "图片上传成功",
8181
"pleaseUploadImageFile": "请上传图片文件",

src/app/api/chat/route.ts

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ import { colorize } from "consola/utils";
4949
import { generateUUID } from "lib/utils";
5050
import { nanoBananaTool, openaiImageTool } from "lib/ai/tools/image";
5151
import { ImageToolName } from "lib/ai/tools";
52+
import { buildCsvIngestionPreviewParts } from "@/lib/ai/ingest/csv-ingest";
53+
import { serverFileStorage } from "lib/file-storage";
5254

5355
const logger = globalLogger.withDefaults({
5456
message: colorize("blackBright", `Chat API: `),
@@ -72,6 +74,7 @@ export async function POST(request: Request) {
7274
allowedMcpServers,
7375
imageTool,
7476
mentions = [],
77+
attachments = [],
7578
} = chatApiSchemaRequestBodySchema.parse(json);
7679

7780
const model = customModelProvider.getModel(chatModel);
@@ -104,6 +107,70 @@ export async function POST(request: Request) {
104107
if (messages.at(-1)?.id == message.id) {
105108
messages.pop();
106109
}
110+
const ingestionPreviewParts = await buildCsvIngestionPreviewParts(
111+
attachments,
112+
(key) => serverFileStorage.download(key),
113+
);
114+
if (ingestionPreviewParts.length) {
115+
const baseParts = [...message.parts];
116+
let insertionIndex = -1;
117+
for (let i = baseParts.length - 1; i >= 0; i -= 1) {
118+
if (baseParts[i]?.type === "text") {
119+
insertionIndex = i;
120+
break;
121+
}
122+
}
123+
if (insertionIndex !== -1) {
124+
baseParts.splice(insertionIndex, 0, ...ingestionPreviewParts);
125+
message.parts = baseParts;
126+
} else {
127+
message.parts = [...baseParts, ...ingestionPreviewParts];
128+
}
129+
}
130+
131+
if (attachments.length) {
132+
const firstTextIndex = message.parts.findIndex(
133+
(part: any) => part?.type === "text",
134+
);
135+
const attachmentParts: any[] = [];
136+
137+
attachments.forEach((attachment) => {
138+
const exists = message.parts.some(
139+
(part: any) =>
140+
part?.type === attachment.type && part?.url === attachment.url,
141+
);
142+
if (exists) return;
143+
144+
if (attachment.type === "file") {
145+
attachmentParts.push({
146+
type: "file",
147+
url: attachment.url,
148+
mediaType: attachment.mediaType,
149+
filename: attachment.filename,
150+
});
151+
} else if (attachment.type === "source-url") {
152+
attachmentParts.push({
153+
type: "source-url",
154+
url: attachment.url,
155+
mediaType: attachment.mediaType,
156+
title: attachment.filename,
157+
});
158+
}
159+
});
160+
161+
if (attachmentParts.length) {
162+
if (firstTextIndex >= 0) {
163+
message.parts = [
164+
...message.parts.slice(0, firstTextIndex),
165+
...attachmentParts,
166+
...message.parts.slice(firstTextIndex),
167+
];
168+
} else {
169+
message.parts = [...message.parts, ...attachmentParts];
170+
}
171+
}
172+
}
173+
107174
messages.push(message);
108175

109176
const supportToolCall = !isToolCallUnsupportedModel(model);

0 commit comments

Comments
 (0)