Commit 20916be
fix: prevent OTel context leak in fire-and-forget background tasks (#5168)
## What's the problem?
When you look at a trace in Jaeger, you expect it to show what happened
during a single request. Instead, we found traces that looked like this
during load testing:
- A request that took **5 seconds** showed a trace lasting **62
seconds**
- That trace contained **2,594 spans**, including **334 database writes
that belonged to completely different requests**
The trace was essentially garbage -- you couldn't tell what actually
happened during the request vs. what leaked in from other requests
happening at the same time.
## Why does this happen?
The server uses background worker tasks to write data to the database
without blocking the API response. These workers are long-lived -- they
start up once and process a shared queue forever.
The problem is how Python's `asyncio.create_task` works: it copies all
context variables (including the OpenTelemetry trace context) at the
moment the task is created. So whichever API request happens to
**first** trigger worker creation permanently stamps its trace ID onto
that worker. Every database write the worker processes from that point
forward -- regardless of which request it came from -- gets attributed
to that original request's trace.
```
Request A arrives → spawns worker → worker inherits trace A
Request B arrives → enqueues work → worker processes it under trace A ← wrong!
Request C arrives → enqueues work → worker processes it under trace A ← wrong!
...forever
```
## How does this fix it?
Two changes working together:
**1. Workers start with a clean slate.**
A new helper (`create_task_with_detached_otel_context`) creates the
worker task with an empty trace context, so it doesn't permanently
inherit any request's identity.
**2. Each queue item carries its own trace context.**
When a request enqueues work, it snapshots its current trace context and
attaches it to the queue item. When the worker picks up that item, it
temporarily activates the captured context for the duration of that
work, then returns to a clean state before processing the next item.
```
Request A arrives → enqueues work with trace A context
Request B arrives → enqueues work with trace B context
Worker (no trace) → picks up item A → activates trace A → writes to DB → deactivates
→ picks up item B → activates trace B → writes to DB → deactivates
```
The result: each database write shows up under the correct request's
trace. No inflation, no cross-contamination.
## What changed?
| File | What it does |
|------|-------------|
| `core/task.py` (new) | Three utilities:
`create_task_with_detached_otel_context` (start tasks clean),
`capture_otel_context` (snapshot current context),
`activate_otel_context` (temporarily restore a captured context) |
| `inference_store.py` | Queue items now carry the OTel context; workers
activate it per-item before writing |
| `openai_responses.py` | Same pattern for the responses background
worker |
## How is this tested?
**14 new tests** across three files:
- **`test_task.py`** (9 tests) -- validates the primitives: detached
tasks get clean context, captured context can be re-activated, context
flows correctly through a queue, and two requests don't contaminate each
other
- **`test_inference_store.py`** (2 tests) -- end-to-end with a real
SQLite-backed InferenceStore: simulates two API requests, lets the queue
+ workers process the writes, and asserts each write lands in the
correct trace (this directly reproduces the original bug)
- **`test_responses_background.py`** (3 tests) -- same validation for
the responses worker, plus a test proving that error-handling DB writes
(marking a response as failed) are also attributed to the correct trace
## Test plan
- [x] All 14 new unit tests pass
- [x] All existing unit tests unaffected
- [x] Inference and Responses API tests that use in memory OTEL span
collectors pass
---------
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Charlie Doern <cdoern@redhat.com>1 parent 1bdf971 commit 20916be
File tree
6 files changed
+774
-75
lines changed- src/llama_stack
- core
- providers
- inline/agents/builtin/responses
- utils/inference
- tests/unit
- core
- providers/agents/builtin
- utils/inference
6 files changed
+774
-75
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
Lines changed: 82 additions & 67 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
| 14 | + | |
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
| 18 | + | |
16 | 19 | | |
17 | 20 | | |
18 | 21 | | |
| |||
82 | 85 | | |
83 | 86 | | |
84 | 87 | | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
85 | 96 | | |
86 | 97 | | |
87 | 98 | | |
| |||
118 | 129 | | |
119 | 130 | | |
120 | 131 | | |
121 | | - | |
| 132 | + | |
122 | 133 | | |
123 | 134 | | |
124 | 135 | | |
| |||
133 | 144 | | |
134 | 145 | | |
135 | 146 | | |
136 | | - | |
| 147 | + | |
137 | 148 | | |
138 | 149 | | |
139 | 150 | | |
| |||
146 | 157 | | |
147 | 158 | | |
148 | 159 | | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
| 160 | + | |
| 161 | + | |
160 | 162 | | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
166 | 166 | | |
167 | | - | |
168 | | - | |
| 167 | + | |
| 168 | + | |
169 | 169 | | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
| 170 | + | |
182 | 171 | | |
183 | | - | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
191 | 203 | | |
192 | 204 | | |
193 | 205 | | |
| |||
820 | 832 | | |
821 | 833 | | |
822 | 834 | | |
823 | | - | |
824 | | - | |
825 | | - | |
826 | | - | |
827 | | - | |
828 | | - | |
829 | | - | |
830 | | - | |
831 | | - | |
832 | | - | |
833 | | - | |
834 | | - | |
835 | | - | |
836 | | - | |
837 | | - | |
838 | | - | |
839 | | - | |
840 | | - | |
841 | | - | |
842 | | - | |
843 | | - | |
844 | | - | |
845 | | - | |
846 | | - | |
847 | | - | |
848 | | - | |
849 | | - | |
| 835 | + | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
850 | 865 | | |
851 | 866 | | |
852 | 867 | | |
| |||
Lines changed: 16 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| 16 | + | |
15 | 17 | | |
16 | 18 | | |
17 | 19 | | |
| |||
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
28 | 36 | | |
29 | 37 | | |
30 | 38 | | |
| |||
37 | 45 | | |
38 | 46 | | |
39 | 47 | | |
40 | | - | |
| 48 | + | |
41 | 49 | | |
42 | 50 | | |
43 | 51 | | |
| |||
98 | 106 | | |
99 | 107 | | |
100 | 108 | | |
101 | | - | |
102 | 109 | | |
103 | | - | |
| 110 | + | |
104 | 111 | | |
105 | 112 | | |
106 | 113 | | |
| |||
110 | 117 | | |
111 | 118 | | |
112 | 119 | | |
| 120 | + | |
113 | 121 | | |
114 | | - | |
| 122 | + | |
115 | 123 | | |
116 | 124 | | |
117 | 125 | | |
118 | 126 | | |
119 | | - | |
| 127 | + | |
120 | 128 | | |
121 | 129 | | |
122 | 130 | | |
| |||
127 | 135 | | |
128 | 136 | | |
129 | 137 | | |
130 | | - | |
131 | 138 | | |
132 | | - | |
| 139 | + | |
| 140 | + | |
133 | 141 | | |
134 | 142 | | |
135 | 143 | | |
| |||
0 commit comments