Skip to content

Conversation

@iyehuda
Copy link
Owner

@iyehuda iyehuda commented Oct 22, 2025

Overview

Introduces a context-aware taint tracking mechanism to protect against
arbitrary code execution during pickle deserialization.

Core mechanism

  • Added security_ctx struct to PyContext with deserialization_taint_counter

  • New internal API to track deserialization state:

    • _PyContext_IncrementDeserializationTaint()
    • _PyContext_DecrementDeserializationTaint()
    • _PyContext_IsDeserializationTainted()
  • Taint counter is incremented when entering pickle.loads() and decremented
    on exit (both success and error paths)

  • Taint state propagates to new contexts created during deserialization

Design rationale

Extends PyContext (thread context) as it's the existing mechanism for context
variables and is natively supported by higher-level concurrency models like
asyncio. Storing the security state in the C struct prevents malicious user
code from overriding or bypassing the protection.

Protection via audit hooks

  • When deserialization is active, blocks dangerous operations including:
    • System commands (os.system, subprocess.Popen)
    • File modifications and thread creation
    • Dynamic loading (ctypes) and network operations

Introduces a context-aware taint tracking mechanism to protect against
arbitrary code execution during pickle deserialization.

Core mechanism:
- Added `security_ctx` struct to PyContext with deserialization_taint_counter
- New internal API to track deserialization state:
  * _PyContext_IncrementDeserializationTaint()
  * _PyContext_DecrementDeserializationTaint()
  * _PyContext_IsDeserializationTainted()

- Taint counter is incremented when entering pickle.loads() and decremented
  on exit (both success and error paths)
- Taint state propagates to new contexts created during deserialization

Design rationale:
Extends PyContext (thread context) as it's the existing mechanism for context
variables and is natively supported by higher-level concurrency models like
asyncio. Storing the security state in the C struct prevents malicious user
code from overriding or bypassing the protection.

Protection via audit hooks:
- When deserialization is active, blocks dangerous operations including:
  * System commands (os.system, subprocess.Popen)
  * Code execution (exec, compile)
  * File modifications and thread creation
  * Dynamic loading (ctypes) and network operations
- Updated `test_os_system_allowed_outside_pickle` to use `os.system('true')` for better clarity.
- Improved handling of event loop policies in `test_taint_cleared_on_error` to ensure proper cleanup.
- Removed outdated tests for `exec` and `compile` blocking, as they are no longer blocked.
- Introduced a new `HARDEN_MODE` configuration to control the behavior of deserialization security checks, allowing for warnings or errors based on the mode set via the `PYTHONHARDENMODE` environment variable.
- Fix failing tests
- Introduced a Dockerfile for building a Python environment based on Debian.
- Configured essential packages and optimizations for Python installation.
- Added a .dockerignore file to exclude unnecessary files and directories from the Docker context, improving build efficiency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant