Skip to content

Commit 1d62323

Browse files
Josu San MartinJosu San Martin
authored andcommitted
Add security review for in-process result authentication
1 parent 283c9b6 commit 1d62323

File tree

2 files changed

+215
-0
lines changed

2 files changed

+215
-0
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,3 +75,7 @@ again to minimize the window of opportunity for cheating by writing results from
7575
small effect on performance, as during the tail of the user kernel blocks of the test kernel are already put on the SMs
7676
and generate memory traffic. In the checking kernel, the order in which blocks are checked is randomized, so that it is
7777
not a viable strategy to only write the later blocks of the result from an unsynchronized stream.
78+
79+
## Security Review
80+
81+
A repository-level security note for the remaining in-process trust-boundary issues is documented in [SECURITY_REVIEW.md](SECURITY_REVIEW.md).

SECURITY_REVIEW.md

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
# Security Review: Result Authentication and In-Process Secret Exposure
2+
3+
## Summary
4+
5+
`pygpubench` is a substantial improvement over a pure-Python evaluator, but it still keeps trusted benchmark state inside the same process as untrusted submission code.
6+
7+
That remaining trust-boundary issue appears to allow a malicious submission to forge benchmark output without running the intended kernel, by recovering the child-process result-authentication secret from memory and writing attacker-controlled results into the inherited result channel before the real benchmark loop completes.
8+
9+
This document is intentionally disclosure-oriented:
10+
11+
- it does **not** include exploit payloads
12+
- it does **not** include step-by-step reproduction code
13+
- it focuses on the issue, impact, and remediation options
14+
15+
## Main Finding
16+
17+
The current isolated benchmarking path still relies on a secret known to the worker process itself.
18+
19+
At a high level:
20+
21+
1. the parent process creates a benchmark subprocess
22+
2. a result-authentication secret is delivered to the child
23+
3. the child stores that secret in process memory
24+
4. untrusted Python submission code runs in that same process
25+
5. the child still has access to the result channel used to report benchmark results
26+
27+
That means the worker can potentially:
28+
29+
- recover the secret from its own address space
30+
- emit forged benchmark results carrying the correct authentication material
31+
- cause the parent to accept attacker-controlled timings
32+
33+
In other words, the integrity of the result channel still depends on trusting code inside the process being benchmarked.
34+
35+
## Why This Matters
36+
37+
If the child process can authenticate arbitrary forged results, then:
38+
39+
- reported timings no longer prove that the benchmarked kernel actually ran
40+
- reported error counts no longer prove that correctness checks actually passed
41+
- benchmark output can be made arbitrarily small or otherwise attacker-controlled
42+
43+
This is a benchmark-integrity failure, not just a local implementation bug.
44+
45+
## Additional Weaknesses Observed
46+
47+
The signature/authentication issue is the primary concern. Several secondary findings make exploitation easier or increase future risk:
48+
49+
### 1. GC-visible benchmark metadata
50+
51+
At import time, Python-visible objects can still reveal useful benchmark structure such as:
52+
53+
- number of repeats
54+
- output tensor metadata
55+
- tolerance information
56+
57+
Even if tensor payloads are protected better than before, metadata leakage reduces attacker uncertainty.
58+
59+
### 2. Warmup predictability
60+
61+
If warmup always uses a deterministic case or stable pointer pattern, a malicious kernel may distinguish warmup from measured iterations and adapt its behavior accordingly.
62+
63+
### 3. NaN wildcard handling
64+
65+
Any checker behavior that treats NaN in expected data as “accept anything” is dangerous. Even if not immediately exploitable through the current path, it creates a latent bypass if expected-output addresses or copies become observable later.
66+
67+
### 4. Overly broad in-process capability
68+
69+
Untrusted Python code still runs with:
70+
71+
- arbitrary `ctypes` access
72+
- process-memory visibility
73+
- inherited file descriptors / pipes
74+
- normal Python runtime introspection
75+
76+
That combination is enough to make “secret inside the same process” a weak design.
77+
78+
## What `pygpubench` Already Improves
79+
80+
This report should not obscure the fact that `pygpubench` already fixes important problems that affect naive Python evaluators.
81+
82+
Compared to an in-process pure-Python benchmark harness, `pygpubench` materially improves resistance against:
83+
84+
- Python monkeypatching of timer objects
85+
- direct patching of Python reference/evaluator functions
86+
- trivial caching of user-visible tensors
87+
- some stream-ordering and L2-cache based reward hacking
88+
89+
So the right framing is:
90+
91+
- the architecture is **better**
92+
- but the remaining secret/result-channel design is still not strong enough for adversarial benchmarking
93+
94+
## Root Cause
95+
96+
The benchmark subprocess is simultaneously:
97+
98+
- the environment running untrusted code
99+
- the holder of trusted result-authentication state
100+
101+
As long as the child both:
102+
103+
1. possesses the authentication material, and
104+
2. can write to the channel accepted by the parent,
105+
106+
the scheme is vulnerable in principle.
107+
108+
The security model still assumes the worker can be trusted with some benchmark-control state. In an adversarial benchmark, it cannot.
109+
110+
## Recommended Fixes
111+
112+
## 1. Do not keep the authentication secret alive inside untrusted execution
113+
114+
Any key, signature, token, or HMAC material used to authenticate results should not remain recoverable after untrusted Python code starts executing.
115+
116+
At minimum:
117+
118+
- generate the key in trusted code
119+
- consume it before importing the submission
120+
- explicitly overwrite it in the child
121+
122+
This reduces the straightforward memory-recovery path.
123+
124+
## 2. Move result authentication to a trusted boundary
125+
126+
A stronger fix is to ensure that the worker process never has the ability to authenticate arbitrary forged results.
127+
128+
Two good directions:
129+
130+
- trusted validator/orchestrator process owns the authentication
131+
- worker emits only raw events/results, never authenticated final records
132+
133+
The parent should authenticate data that the worker cannot forge by construction.
134+
135+
## 3. Reduce in-process attack surface
136+
137+
Possible hardening measures:
138+
139+
- restrict inherited file descriptors
140+
- tighten seccomp / syscall policy where practical
141+
- minimize procfs visibility where practical
142+
- reduce or remove unnecessary writable/redirectable channels in the worker
143+
144+
These are secondary mitigations, not substitutes for a correct trust boundary.
145+
146+
## 4. Remove or minimize metadata leakage
147+
148+
Before importing the submission:
149+
150+
- drop Python references that are no longer needed
151+
- clean up transient objects that reveal benchmark layout
152+
- avoid keeping expected-output metadata visible through ordinary Python object traversal
153+
154+
## 5. Randomize warmup and pre-measurement behavior
155+
156+
Avoid deterministic warmup patterns that allow the kernel to distinguish:
157+
158+
- warmup
159+
- estimated timing
160+
- real benchmark passes
161+
162+
## 6. Fail closed on suspicious checker states
163+
164+
Examples:
165+
166+
- NaN in expected outputs should be treated as a benchmark-generation failure, not a wildcard
167+
- malformed or incomplete child output should fail hard
168+
- any mismatch in result structure should fail hard
169+
170+
## Suggested Architectural Direction
171+
172+
### Option A: Trusted validator split
173+
174+
Use a three-role model:
175+
176+
- orchestrator: trusted, owns benchmark policy and result acceptance
177+
- worker: untrusted, only runs the kernel
178+
- validator: trusted, checks correctness and/or authenticates the final result
179+
180+
The worker should not be able to independently produce a parent-acceptable final benchmark record.
181+
182+
### Option B: Transitional hardening
183+
184+
If a full redesign is not immediately feasible:
185+
186+
1. remove recoverable result-authentication state before user import
187+
2. aggressively reduce inherited descriptors/capabilities
188+
3. clear Python-visible benchmark metadata before import
189+
4. add tamper detection for result-channel anomalies
190+
191+
This would still be weaker than a proper split-process trust model, but materially better than the current design.
192+
193+
## Recommended Next Steps
194+
195+
1. treat this as a security bug affecting adversarial benchmark integrity
196+
2. review the result-authentication path end-to-end
197+
3. patch the child secret lifetime / ownership problem first
198+
4. then follow up with capability reduction and metadata cleanup
199+
5. publish a brief security note once the fix lands
200+
201+
## Scope Of This Document
202+
203+
This file is intended to support remediation planning inside the repository.
204+
205+
It does **not**:
206+
207+
- provide exploit code
208+
- attribute methods to any third party
209+
- claim that every theoretical path has been weaponized
210+
211+
It only records that the current result-authentication design remains vulnerable because trusted benchmark state is still exposed inside the untrusted worker process.

0 commit comments

Comments
 (0)