Skip to content

Conversation

@padak
Copy link

@padak padak commented Nov 11, 2025

Summary

Fixes two HIGH priority security and performance issues in file directory handling.

1. Path Traversal Vulnerability (HIGH)

The files_from_dir() and async_files_from_dir() functions did not check for symlinks, allowing path traversal attacks:

  • A symlink pointing to /etc/passwd could be read and uploaded
  • The API would report a harmless relative path like evil_link/passwd
  • Sensitive local files could leak to the API

Fix: Resolve all paths and verify they remain within the root directory.

2. Memory Exhaustion (MEDIUM)

The functions loaded entire directory trees into memory simultaneously:

  • All files were read with read_bytes() and stored in a list
  • Large directories (hundreds of MB) could exhaust available RAM
  • No way to process large file trees efficiently

Fix: Added new streaming iterator versions that yield files one at a time:

  • files_from_dir_iter() - synchronous generator
  • async_files_from_dir_iter() - async generator

Original functions preserved for backwards compatibility.

Implementation Details

Python 3.8 Compatibility

  • Added _is_path_within_root() helper function
  • Uses path.relative_to() + try/except instead of Path.is_relative_to()
  • is_relative_to() was added in Python 3.9, but this SDK supports Python 3.8+

Async Path Handling

  • await anyio.Path.resolve() returns a standard pathlib.Path, not anyio.Path
  • Fixed incorrect await usage on synchronous Path methods
  • Properly convert between anyio.Path and Path types

Error Handling

  • Added try/except for PermissionError and OSError
  • Prevents DoS attacks via unreadable files or broken symlinks
  • Gracefully skips problematic entries instead of crashing

Changes

  • src/anthropic/lib/_files.py:
    • Added _is_path_within_root() for Python 3.8-compatible containment checks
    • Added path.resolve() and containment validation to all functions
    • Fixed async functions to handle Path/anyio.Path conversion correctly
    • Added comprehensive error handling for filesystem operations
    • Added files_from_dir_iter() for memory-efficient streaming
    • Added async_files_from_dir_iter() for async streaming

Testing

  • ✅ Python 3.8 compatibility
  • ✅ Proper async/await usage
  • ✅ Correct symlink resolution
  • ✅ Error handling for edge cases

Impact

  • Security: Prevents path traversal attacks via symlinks
  • Performance: Provides memory-efficient alternative for large directory trees
  • Compatibility: Maintains Python 3.8+ support
  • Reliability: Improved error handling prevents crashes
  • Backwards Compatibility: All existing APIs preserved, new streaming APIs opt-in

…andling

Fixes two security and performance issues in file directory handling:

1. Path Traversal Vulnerability (HIGH):
   - files_from_dir/async_files_from_dir did not check for symlinks
   - Symlinks could point outside the root directory (e.g., to /etc/passwd)
   - Attackers could read sensitive files while reporting harmless paths
   - Fix: Resolve all paths and verify they stay within root directory

2. Memory Exhaustion (MEDIUM):
   - All files in a directory tree were loaded into memory simultaneously
   - Large directories (hundreds of MB) could exhaust available RAM
   - Fix: Add streaming iterator versions (files_from_dir_iter,
     async_files_from_dir_iter) that yield files one at a time
   - Original functions preserved for backwards compatibility

Implementation details:
- Added _is_path_within_root() helper for Python 3.8 compatibility
  (Path.is_relative_to() was added in Python 3.9)
- Fixed async functions to handle Path/anyio.Path conversion correctly
  (await path.resolve() returns standard Path, not anyio.Path)
- Added error handling for PermissionError and OSError to prevent
  DoS via unreadable files or broken symlinks
- All path checks use resolve() + relative_to() for safe containment

Changed:
- src/anthropic/lib/_files.py:
  * Added _is_path_within_root() for Python 3.8 compatibility
  * Added path.resolve() and containment checks to all functions
  * Fixed async functions to properly handle Path types
  * Added error handling for permission errors
  * Added files_from_dir_iter() for streaming
  * Added async_files_from_dir_iter() for async streaming
@padak padak requested a review from a team as a code owner November 11, 2025 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant