Skip to content

Conversation

@acancino-gnv
Copy link

@acancino-gnv acancino-gnv commented Dec 24, 2025

Summary

  • Adds configurable file metadata exposure to LLMs when files are uploaded
  • Metadata (filename, mimeType, sizeBytes, etc.) is injected into messages sent to LLMs
  • Enables models to reference files by name and pass metadata to tools/MCP servers

Changes

  • Added FileMetadataFields enum with core and opt-in fields
  • Added fileMetadataConfigSchema for YAML configuration
  • Implemented formatFileMetadata() function with markdown/json/xml output formats
  • Updated extractFileContext() to inject metadata into messages
  • Added comprehensive test coverage

Configuration Example

fileConfig:
  metadata:
    enabled: true
    fields:
      - filename
      - type
      - bytes
      - source
      - filepath  # opt-in: exposes storage path
    format: markdown  # or json, xml

Output Formats

Markdown (default):

**File Metadata:**
- **filename**: report.pdf
- **type**: application/pdf
- **bytes**: 1048576
- **size_human**: 1 MB

JSON:

{
  "filename": "report.pdf",
  "type": "application/pdf",
  "bytes": 1048576,
  "size_human": "1 MB"
}

Available Fields

Field Description Default
filename Original filename
type MIME type
bytes Size in bytes (+ human readable)
source Storage backend (local, s3, azure, etc.)
width / height Image dimensions
createdAt / updatedAt Timestamps
filepath Full storage path (opt-in)
conversationId Session identifier (opt-in)
file_id Unique file ID (opt-in)

Test plan

  • Unit tests for formatFileMetadata() covering all formats and fields
  • Unit tests for extractFileContext() with metadata enabled/disabled
  • Manual testing with actual file uploads
  • Verify metadata appears in LLM context for OpenAI, Anthropic, Google endpoints

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a configurable file metadata exposure feature that allows LLMs to access file metadata (filename, type, size, etc.) when processing attachments. Metadata can be injected in markdown, JSON, or XML formats, with configurable field selection including both safe defaults and opt-in sensitive fields.

Key changes:

  • Adds type definitions and Zod schemas for file metadata configuration
  • Implements metadata formatting functions supporting markdown, JSON, and XML output
  • Integrates metadata injection into the file context extraction pipeline
  • Provides comprehensive test coverage for formatting and extraction logic

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
packages/data-provider/src/types/files.ts Adds FileMetadataConfig type definition for metadata configuration
packages/data-provider/src/file-config.ts Defines FileMetadataFields enum, default fields, Zod schema, and merging logic
packages/api/src/files/context.ts Implements metadata formatting functions and integrates metadata injection into extractFileContext
packages/api/src/files/context.spec.ts Adds comprehensive unit tests for metadata formatting and file context extraction
librechat.example.yaml Documents configuration options with examples of safe and opt-in fields

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alonso-cancino
Copy link

@danny-avila I worked out the copilot comments, if you could review it again.

@alonso-cancino alonso-cancino force-pushed the feat/file_metadata_exposure_to_llms branch from f3d67b3 to 22d6b6d Compare December 27, 2025 15:14
@acancino-gnv acancino-gnv force-pushed the feat/file_metadata_exposure_to_llms branch from 22d6b6d to 99fb824 Compare December 29, 2025 14:51
acancino-gnv and others added 2 commits December 29, 2025 13:45
When users upload files, LLMs and tools/MCP servers can now access
key metadata (filename, mimeType, sizeBytes, etc.) through a new
configurable fileConfig.metadata option in librechat.yaml.

Features:
- New FileMetadataFields enum with core and opt-in fields
- Configurable output formats: markdown, json, xml
- Default fields: filename, type, bytes (safe defaults)
- Opt-in sensitive fields: filepath, conversationId, file_id
- Human-readable size formatting (e.g., "1.5 MB")
- Full test coverage for formatFileMetadata and extractFileContext

Configuration example:
```yaml
fileConfig:
  metadata:
    enabled: true
    fields: [filename, type, bytes, source, filepath]
    format: json
```

This enables LLMs to reference files by name and pass metadata
to tools when invoking them.
- Fix XML injection by escaping special characters in formatAsXml
- Fix formatBytes overflow for files >= 1TB (add TB/PB units)
- Remove unused hasTextFiles variable
- Add token limit for metadata to prevent excessive context usage
- Move source field to opt-in section (may expose infrastructure details)
- Add tests for XML special characters and TB file sizes
@acancino-gnv acancino-gnv force-pushed the feat/file_metadata_exposure_to_llms branch from 99fb824 to ced7e6e Compare December 29, 2025 16:45
@acancino-gnv
Copy link
Author

@danny-avila can you review it again please, I fixed all copilot issues, and made manual tests on my end to see the metadata injection works properly with different providers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants