-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
System Info
Collecting environment information...
PyTorch version: 2.10.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 26.3.1 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.6.4.2)
CMake version: Could not collect
Libc version: N/A
Python version: 3.11.9 (v3.11.9:de54cf5be3, Apr 2 2024, 07:12:50) [Clang 13.0.0 (clang-1300.0.29.30)] (64-bit runtime)
Python platform: macOS-26.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
CPU:
Apple M3 Pro
Versions of relevant libraries:
[pip3] numpy==2.4.2
[pip3] torch==2.10.0
[conda] Could not collectInformation
- The official example scripts
- My own modified scripts
🐛 Describe the bug
In #5168 description, @iamemilio highlights an issue with asyncio.create_task automatically copying all context vars into long running background worker memories at the time of task creation leading to incorrect/misleading OTEL traces because of these traces lingering long after the initial request is completed
In addition to otel traces, the PROVIDER_DATA_VAR containing sensitive authentication/authorization token information is also copied into these workers' memories and not correctly flushed out when the write queue is enabled. This leads to a leak where user isolation is compromised for any stored resources, and users may access each others' stored conversations and responses.
Following is the description generated by claude:
How it works step by step
Step 1: Worker creation inherits the wrong user
When the first request arrives, the middleware sets PROVIDER_DATA_VAR with that user's identity:
server.py Lines 304-304
with request_provider_data_context(headers, user):Deep inside the request handler, store_chat_completion lazily creates worker tasks:
inference_store.py Lines 101-103
if not self._worker_tasks:
loop = asyncio.get_running_loop()
task = loop.create_task(self._worker_loop())asyncio.create_task copies all contextvars -- including PROVIDER_DATA_VAR. The worker permanently inherits User A's identity. Same for the responses background worker.
Step 2: Every DB write stamps the worker's (wrong) user
When the worker writes to the database, it goes through AuthorizedSqlStore:
authorized_sqlstore.py Lines 132-140
async def insert(self, table: str, data: Mapping[str, Any] | Sequence[Mapping[str, Any]]) -> None:
"""Insert a row or batch of rows with automatic access control attribute capture."""
current_user = get_authenticated_user()
# ...
enhanced_data = _enhance_item_with_access_control(data, current_user)
await self.sql_store.insert(table, enhanced_data)get_authenticated_user() reads PROVIDER_DATA_VAR from the current task's context -- which is the first user's identity, regardless of who made the request. So owner_principal is stamped wrong.
For the responses worker update also overwrites owner_principal:
authorized_sqlstore.py Lines 222-235
async def update(self, table: str, data: Mapping[str, Any], where: Mapping[str, Any]) -> None:
"""Update rows with automatic access control attribute capture."""
enhanced_data = dict(data)
current_user = get_authenticated_user()
if current_user:
enhanced_data["owner_principal"] = current_user.principal
enhanced_data["access_attributes"] = current_user.attributes
# ...The response was created with the correct user during the synchronous part of the request, but when the background worker updates its status to "completed" or "failed", it overwrites owner_principal with the first user's identity.
Existing tests don't catch this for 2 reasons:
- Write queue is disabled for SQLite, which is what is used for most unit and integration tests. This forces synchronous writes, which get the correct PROVIDER_DATA_VAR values. The bug only manifests on postgres
- existing user isolation tests only check the synchronous path, not the write queue path
When does it actually matter?
The leak is harmful when all three conditions are true:
- Authentication is enabled -- otherwise get_authenticated_user() returns None everywhere, all records are "unowned/public", and there's no identity to leak
- Access control policies are configured (like user is owner) -- otherwise all users can see all records anyway
- The write queue is active -- Postgres for inference store (always for responses background worker)
In the default configuration (no auth, SQLite), the bug is completely invisible. In a production multi-tenant deployment with Postgres and auth -- which is exactly the deployment most likely to care about isolation -- it's a real data isolation violation where User A can see User B's completions/responses.
Error logs
N/A
Expected behavior
Context inheritance should be fixed such that each worker starts with a clean slate and user identity is not persisted in memory beyond the scope of the request that spawned that worker