Skip to content

Commit bcd97aa

Browse files
πŸ“Ž feat: Direct Provider Attachment Support for Multimodal Content (danny-avila#9994)
* πŸ“Ž feat: Direct Provider Attachment Support for Multimodal Content * πŸ“‘ feat: Anthropic Direct Provider Upload (danny-avila#9072) * feat: implement Anthropic native PDF support with document preservation - Add comprehensive debug logging throughout PDF processing pipeline - Refactor attachment processing to separate image and document handling - Create distinct addImageURLs(), addDocuments(), and processAttachments() methods - Fix critical bugs in stream handling and parameter passing - Add streamToBuffer utility for proper stream-to-buffer conversion - Remove api/agents submodule from repository πŸ€– Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * chore: remove out of scope formatting changes * fix: stop duplication of file in chat on end of response stream * chore: bring back file search and ocr options * chore: localize upload to provider string in file menu * refactor: change createMenuItems args to fit new pattern introduced by anthropic-native-pdf-support * feat: add cache point for pdfs processed by anthropic endpoint since they are unlikely to change and should benefit from caching * feat: combine Upload Image into Upload to Provider since they both perform direct upload and change provider upload icon to reflect multimodal upload * feat: add citations support according to docs * refactor: remove redundant 'document' check since documents are handled properly by formatMessage in the agents repo now * refactor: change upload logic so anthropic endpoint isn't exempted from normal upload path using Agents for consistency with the rest of the upload logic * fix: include width and height in return from uploadLocalFile so images are correctly identified when going through an AgentUpload in addImageURLs * chore: remove client specific handling since the direct provider stuff is handled by the agent client * feat: handle documents in AgentClient so no need for change to agents repo * chore: removed unused changes * chore: remove auto generated comments from OG commit * feat: add logic for agents to use direct to provider uploads if supported (currently just anthropic) * fix: reintroduce role check to fix render error because of undefined value for Content Part * fix: actually fix render bug by using proper isCreatedByUser check and making sure our mutation of formattedMessage.content is consistent --------- Co-authored-by: Andres Restrepo <[email protected]> Co-authored-by: Claude <[email protected]> πŸ“ feat: Send Attachments Directly to Provider (OpenAI) (danny-avila#9098) * refactor: change references from direct upload to direct attach to better reflect functionality since we are just using base64 encoding strategy now rather than Files/File API for sending our attachments directly to the provider, the upload nomenclature no longer makes sense. direct_attach better describes the different methods of sending attachments to providers anyways even if we later introduce direct upload support * feat: add upload to provider option for openai (and agent) ui * chore: move anthropic pdf validator over to packages/api * feat: simple pdf validation according to openai docs * feat: add provider agnostic validatePdf logic to start handling multiple endpoints * feat: add handling for openai specific documentPart formatting * refactor: move require statement to proper place at top of file * chore: add in openAI endpoint for the rest of the document handling logic * feat: add direct attach support for azureOpenAI endpoint and agents * feat: add pdf validation for azureOpenAI endpoint * refactor: unify all the endpoint checks with isDocumentSupportedEndpoint * refactor: consolidate Upload to Provider vs Upload image logic for clarity * refactor: remove anthropic from anthropic_multimodal fileType since we support multiple providers now πŸ—‚οΈ feat: Send Attachments Directly to Provider (Google) (danny-avila#9100) * feat: add validation for google PDFs and add google endpoint as a document supporting endpoint * feat: add proper pdf formatting for google endpoints (requires PR danny-avila#14 in agents) * feat: add multimodal support for google endpoint attachments * feat: add audio file svg * fix: refactor attachments logic so multi-attachment messages work properly * feat: add video file svg * fix: allows for followup questions of uploaded multimodal attachments * fix: remove incorrect final message filtering that was breaking Attachment component rendering fix: manualy rename 'documents' to 'Documents' in git since it wasn't picked up due to case insensitivity in dir name fix: add logic so filepicker for a google agent has proper filetype filtering πŸ›« refactor: Move Encoding Logic to packages/api (danny-avila#9182) * refactor: move audio encode over to TS * refactor: audio encoding now functional in LC again * refactor: move video encode over to TS * refactor: move document encode over to TS * refactor: video encoding now functional in LC again * refactor: document encoding now functional in LC again * fix: extend file type options in AttachFileMenu to include 'google_multimodal' and update dependency array to include agent?.provider * feat: only accept pdfs if responses api is enabled for openai convos chore: address ESLint comments chore: add missing audio mimetype * fix: type safety for message content parts and improve null handling * chore: reorder AttachFileMenuProps for consistency and clarity * chore: import order in AttachFileMenu * fix: improve null handling for text parts in parseTextParts function * fix: remove no longer used unsupported capability error message for file uploads * fix: OpenAI Direct File Attachment Format * fix: update encodeAndFormatDocuments to support OpenAI responses API and enhance document result types * refactor: broaden providers supported for documents * feat: enhance DragDrop context and modal to support document uploads based on provider capabilities * fix: reorder import statements for consistency in video encoding module --------- Co-authored-by: Dustin Healy <[email protected]>
1 parent 9c77f53 commit bcd97aa

File tree

33 files changed

+1040
-74
lines changed

33 files changed

+1040
-74
lines changed

β€Žapi/app/clients/BaseClient.jsβ€Ž

Lines changed: 106 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,24 @@
11
const crypto = require('crypto');
22
const fetch = require('node-fetch');
33
const { logger } = require('@librechat/data-schemas');
4-
const { getBalanceConfig } = require('@librechat/api');
54
const {
6-
supportsBalanceCheck,
7-
isAgentsEndpoint,
8-
isParamEndpoint,
9-
EModelEndpoint,
5+
getBalanceConfig,
6+
encodeAndFormatAudios,
7+
encodeAndFormatVideos,
8+
encodeAndFormatDocuments,
9+
} = require('@librechat/api');
10+
const {
11+
Constants,
12+
ErrorTypes,
1013
ContentTypes,
1114
excludedKeys,
12-
ErrorTypes,
13-
Constants,
15+
EModelEndpoint,
16+
isParamEndpoint,
17+
isAgentsEndpoint,
18+
supportsBalanceCheck,
1419
} = require('librechat-data-provider');
1520
const { getMessages, saveMessage, updateMessage, saveConvo, getConvo } = require('~/models');
21+
const { getStrategyFunctions } = require('~/server/services/Files/strategies');
1622
const { checkBalance } = require('~/models/balanceMethods');
1723
const { truncateToolCallOutputs } = require('./prompts');
1824
const { getFiles } = require('~/models/File');
@@ -1198,8 +1204,99 @@ class BaseClient {
11981204
return await this.sendCompletion(payload, opts);
11991205
}
12001206

1207+
async addDocuments(message, attachments) {
1208+
const documentResult = await encodeAndFormatDocuments(
1209+
this.options.req,
1210+
attachments,
1211+
{
1212+
provider: this.options.agent?.provider,
1213+
useResponsesApi: this.options.agent?.model_parameters?.useResponsesApi,
1214+
},
1215+
getStrategyFunctions,
1216+
);
1217+
message.documents =
1218+
documentResult.documents && documentResult.documents.length
1219+
? documentResult.documents
1220+
: undefined;
1221+
return documentResult.files;
1222+
}
1223+
1224+
async addVideos(message, attachments) {
1225+
const videoResult = await encodeAndFormatVideos(
1226+
this.options.req,
1227+
attachments,
1228+
this.options.agent.provider,
1229+
getStrategyFunctions,
1230+
);
1231+
message.videos =
1232+
videoResult.videos && videoResult.videos.length ? videoResult.videos : undefined;
1233+
return videoResult.files;
1234+
}
1235+
1236+
async addAudios(message, attachments) {
1237+
const audioResult = await encodeAndFormatAudios(
1238+
this.options.req,
1239+
attachments,
1240+
this.options.agent.provider,
1241+
getStrategyFunctions,
1242+
);
1243+
message.audios =
1244+
audioResult.audios && audioResult.audios.length ? audioResult.audios : undefined;
1245+
return audioResult.files;
1246+
}
1247+
1248+
async processAttachments(message, attachments) {
1249+
const categorizedAttachments = {
1250+
images: [],
1251+
documents: [],
1252+
videos: [],
1253+
audios: [],
1254+
};
1255+
1256+
for (const file of attachments) {
1257+
if (file.type.startsWith('image/')) {
1258+
categorizedAttachments.images.push(file);
1259+
} else if (file.type === 'application/pdf') {
1260+
categorizedAttachments.documents.push(file);
1261+
} else if (file.type.startsWith('video/')) {
1262+
categorizedAttachments.videos.push(file);
1263+
} else if (file.type.startsWith('audio/')) {
1264+
categorizedAttachments.audios.push(file);
1265+
}
1266+
}
1267+
1268+
const [imageFiles, documentFiles, videoFiles, audioFiles] = await Promise.all([
1269+
categorizedAttachments.images.length > 0
1270+
? this.addImageURLs(message, categorizedAttachments.images)
1271+
: Promise.resolve([]),
1272+
categorizedAttachments.documents.length > 0
1273+
? this.addDocuments(message, categorizedAttachments.documents)
1274+
: Promise.resolve([]),
1275+
categorizedAttachments.videos.length > 0
1276+
? this.addVideos(message, categorizedAttachments.videos)
1277+
: Promise.resolve([]),
1278+
categorizedAttachments.audios.length > 0
1279+
? this.addAudios(message, categorizedAttachments.audios)
1280+
: Promise.resolve([]),
1281+
]);
1282+
1283+
const allFiles = [...imageFiles, ...documentFiles, ...videoFiles, ...audioFiles];
1284+
const seenFileIds = new Set();
1285+
const uniqueFiles = [];
1286+
1287+
for (const file of allFiles) {
1288+
if (file.file_id && !seenFileIds.has(file.file_id)) {
1289+
seenFileIds.add(file.file_id);
1290+
uniqueFiles.push(file);
1291+
} else if (!file.file_id) {
1292+
uniqueFiles.push(file);
1293+
}
1294+
}
1295+
1296+
return uniqueFiles;
1297+
}
1298+
12011299
/**
1202-
*
12031300
* @param {TMessage[]} _messages
12041301
* @returns {Promise<TMessage[]>}
12051302
*/
@@ -1248,7 +1345,7 @@ class BaseClient {
12481345
{},
12491346
);
12501347

1251-
await this.addImageURLs(message, files, this.visionMode);
1348+
await this.processAttachments(message, files);
12521349

12531350
this.message_file_map[message.messageId] = files;
12541351
return message;

β€Žapi/package.jsonβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@
4848
"@langchain/google-genai": "^0.2.13",
4949
"@langchain/google-vertexai": "^0.2.13",
5050
"@langchain/textsplitters": "^0.1.0",
51-
"@librechat/agents": "^2.4.84",
51+
"@librechat/agents": "^2.4.85",
5252
"@librechat/api": "*",
5353
"@librechat/data-schemas": "*",
5454
"@microsoft/microsoft-graph-client": "^3.0.7",

β€Žapi/server/controllers/agents/client.jsβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,7 @@ class AgentClient extends BaseClient {
257257
};
258258
}
259259

260-
const files = await this.addImageURLs(
260+
const files = await this.processAttachments(
261261
orderedMessages[orderedMessages.length - 1],
262262
attachments,
263263
);

β€Žapi/server/services/Files/Local/crud.jsβ€Ž

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ const axios = require('axios');
44
const { logger } = require('@librechat/data-schemas');
55
const { EModelEndpoint } = require('librechat-data-provider');
66
const { generateShortLivedToken } = require('@librechat/api');
7+
const { resizeImageBuffer } = require('~/server/services/Files/images/resize');
78
const { getBufferMetadata } = require('~/server/utils');
89
const paths = require('~/config/paths');
910

@@ -286,7 +287,18 @@ async function uploadLocalFile({ req, file, file_id }) {
286287
await fs.promises.writeFile(newPath, inputBuffer);
287288
const filepath = path.posix.join('/', 'uploads', req.user.id, path.basename(newPath));
288289

289-
return { filepath, bytes };
290+
let height, width;
291+
if (file.mimetype && file.mimetype.startsWith('image/')) {
292+
try {
293+
const { width: imgWidth, height: imgHeight } = await resizeImageBuffer(inputBuffer, 'high');
294+
height = imgHeight;
295+
width = imgWidth;
296+
} catch (error) {
297+
logger.warn('[uploadLocalFile] Could not get image dimensions:', error.message);
298+
}
299+
}
300+
301+
return { filepath, bytes, height, width };
290302
}
291303

292304
/**

β€Žapi/server/services/Files/process.jsβ€Ž

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -522,11 +522,6 @@ const processAgentFileUpload = async ({ req, res, metadata }) => {
522522
}
523523

524524
const isImage = file.mimetype.startsWith('image');
525-
if (!isImage && !tool_resource) {
526-
/** Note: this needs to be removed when we can support files to providers */
527-
throw new Error('No tool resource provided for non-image agent file upload');
528-
}
529-
530525
let fileInfoMetadata;
531526
const entity_id = messageAttachment === true ? undefined : agent_id;
532527
const basePath = mime.getType(file.originalname)?.startsWith('image') ? 'images' : 'uploads';

β€Žclient/src/Providers/DragDropContext.tsxβ€Ž

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,38 @@
11
import React, { createContext, useContext, useMemo } from 'react';
2+
import type { EModelEndpoint } from 'librechat-data-provider';
3+
import { useGetEndpointsQuery } from '~/data-provider';
4+
import { getEndpointField } from '~/utils/endpoints';
25
import { useChatContext } from './ChatContext';
36

47
interface DragDropContextValue {
58
conversationId: string | null | undefined;
69
agentId: string | null | undefined;
10+
endpoint: string | null | undefined;
11+
endpointType?: EModelEndpoint | undefined;
712
}
813

914
const DragDropContext = createContext<DragDropContextValue | undefined>(undefined);
1015

1116
export function DragDropProvider({ children }: { children: React.ReactNode }) {
1217
const { conversation } = useChatContext();
18+
const { data: endpointsConfig } = useGetEndpointsQuery();
19+
20+
const endpointType = useMemo(() => {
21+
return (
22+
getEndpointField(endpointsConfig, conversation?.endpoint, 'type') ||
23+
(conversation?.endpoint as EModelEndpoint | undefined)
24+
);
25+
}, [conversation?.endpoint, endpointsConfig]);
1326

1427
/** Context value only created when conversation fields change */
1528
const contextValue = useMemo<DragDropContextValue>(
1629
() => ({
1730
conversationId: conversation?.conversationId,
1831
agentId: conversation?.agent_id,
32+
endpoint: conversation?.endpoint,
33+
endpointType: endpointType,
1934
}),
20-
[conversation?.conversationId, conversation?.agent_id],
35+
[conversation?.conversationId, conversation?.agent_id, conversation?.endpoint, endpointType],
2136
);
2237

2338
return <DragDropContext.Provider value={contextValue}>{children}</DragDropContext.Provider>;

β€Žclient/src/components/Chat/Input/Files/AttachFileChat.tsxβ€Ž

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,15 @@ import { memo, useMemo } from 'react';
22
import {
33
Constants,
44
supportsFiles,
5+
EModelEndpoint,
56
mergeFileConfig,
67
isAgentsEndpoint,
78
isAssistantsEndpoint,
89
fileConfig as defaultFileConfig,
910
} from 'librechat-data-provider';
1011
import type { EndpointFileConfig, TConversation } from 'librechat-data-provider';
11-
import { useGetFileConfig } from '~/data-provider';
12+
import { useGetFileConfig, useGetEndpointsQuery } from '~/data-provider';
13+
import { getEndpointField } from '~/utils/endpoints';
1214
import AttachFileMenu from './AttachFileMenu';
1315
import AttachFile from './AttachFile';
1416

@@ -20,14 +22,23 @@ function AttachFileChat({
2022
conversation: TConversation | null;
2123
}) {
2224
const conversationId = conversation?.conversationId ?? Constants.NEW_CONVO;
23-
const { endpoint, endpointType } = conversation ?? { endpoint: null };
25+
const { endpoint } = conversation ?? { endpoint: null };
2426
const isAgents = useMemo(() => isAgentsEndpoint(endpoint), [endpoint]);
2527
const isAssistants = useMemo(() => isAssistantsEndpoint(endpoint), [endpoint]);
2628

2729
const { data: fileConfig = defaultFileConfig } = useGetFileConfig({
2830
select: (data) => mergeFileConfig(data),
2931
});
3032

33+
const { data: endpointsConfig } = useGetEndpointsQuery();
34+
35+
const endpointType = useMemo(() => {
36+
return (
37+
getEndpointField(endpointsConfig, endpoint, 'type') ||
38+
(endpoint as EModelEndpoint | undefined)
39+
);
40+
}, [endpoint, endpointsConfig]);
41+
3142
const endpointFileConfig = fileConfig.endpoints[endpoint ?? ''] as EndpointFileConfig | undefined;
3243
const endpointSupportsFiles: boolean = supportsFiles[endpointType ?? endpoint ?? ''] ?? false;
3344
const isUploadDisabled = (disableInputs || endpointFileConfig?.disabled) ?? false;
@@ -37,7 +48,9 @@ function AttachFileChat({
3748
} else if (isAgents || (endpointSupportsFiles && !isUploadDisabled)) {
3849
return (
3950
<AttachFileMenu
51+
endpoint={endpoint}
4052
disabled={disableInputs}
53+
endpointType={endpointType}
4154
conversationId={conversationId}
4255
agentId={conversation?.agent_id}
4356
endpointFileConfig={endpointFileConfig}

0 commit comments

Comments
Β (0)