[wip] fix some hackable level 1 problems #118

bkal01 · 2026-01-07T07:13:46Z

As stated in this issue and this PR by @doux-jy, it's possible to hack 94: MSE via the statistical properties of the input.

96: Huber Loss and 99: Hinge Loss also face the same problem and we can fix these issues the same way, by sampling inputs from a heavy-tailed distribution.

Huber Loss

For the Huber Loss, given some targets $y$ and predictions $\hat{y}$, we compute:

$$ L(y, \hat{y} = \begin{cases} 0.5(y-\hat{y})^2 & \text{if} |y - \hat{y}|\leq 1 \\ |y - \hat{y}| - 0.5& \text{otherwise} \end{cases} $$

If $y, \hat{y}, s \sim \text{Unif}(0,1)$, where $s$ is the scale, then $|y - \hat{y}|$ can never exceed 1, implying that the Huber Loss problem is functionally just computing the MSE Loss. Since we know this can be hacked via direct expectation computation, the Huber Loss problem is also vulnerable.

Hinge Loss

For the Hinge Loss, given some targets $y$ that are $\pm 1$ and predictions $\hat{y}$, we compute:

$$ L(y_i, \hat{y_i}) = \text{max}(0, 1 - y_i\cdot \hat{y}_i) $$

averaged across all indices. Given that $\hat{y}\sim\text{Unif}(0,1)$ and $y_i$ is $\pm 1$ with equal probability, we can say:

$$ L(y_i, \hat{y_i}) = \begin{cases} \mathbb{E}[1 - \hat{y_i}] & y_i = 1 \\ \mathbb{E}[1 + \hat{y_i}] & y_i = -1 \end{cases} $$

So,

$$\mathbb{E}[L | y] = \frac{0.5n_{\text{pos}} + 1.5n_{\text{neg}}}{n}$$ $$= 1.5 - \frac{n_{\text{pos}}}{n}$$

which we get by substituting $n_{\text{neg}} = n - n_{\text{pos}}$.

Finally, by construction of $y$:

$$\frac{1}{n}\sum_{i=1}^{n}y_i = \frac{n_{\text{pos}} - n_{\text{neg}}}{n}$$ $$ = \frac{2n_{\text{pos}}}{n} - 1$$

This means we can use the mean of the targets to compute the expected value of the output (1.0 - 0.5*targets.mean())

doux-jy · 2026-01-07T07:47:05Z

Thanks for your PR addressing the MSELoss validation issue. I noticed you used Pareto(0.01, 0.15) for the fix.

A quick technical note: with $\alpha = 0.15$, the distribution's first moment(mean) doesn't converge. This means MSE's expected value becomes mathematically unstable, potentially undermining the validation's reliability.

For Pareto distributions:
$\alpha \le 1$: first moment diverges
$1 < \alpha \le 2$: first moment converges, second moment diverges ✓
$\alpha > 2$: both moments converge

We need $1 < \alpha \le 2$ to ensure MSE grows with sample size while keeping expectations stable. I'd suggest using $\alpha = 1.5$ (as in my original proposal) or another value in $(1, 2]$. This ensures the validation properly detects partial computations.

Happy to discuss further!

bkal01 · 2026-01-07T07:48:43Z

whoops you are totally right! mistake on my part moving the decimal point up.

With inputs sampled from Unif(0,1), the Huber Loss is effectively MSE, which we know can be hacked via statistical properties of the loss fn/inputs. We use the Pareto distribution to sample inputs w/finite mean and infinite variance to prevent hacking this way.

With inputs sampled from Unif(0,1), we can directly compute the expected value of the output using the mean of the targets. We use the Pareto distribution to sample inputs w/finite mean and infinite variance to prevent hacking this way.

…tion

simonguozirui · 2026-01-07T19:53:27Z

Changes make sense to me and good that we are improving this on the problem definition side as people might just take the problems for their use case(we can do more on the eval fuzzing side too).

Since now KenrelBench supports more precision (bf16 and fp16 in addition to the original fp32), let's make sure this doesn't cause trouble with overflowing, etc.

adds some tests for problems 94, 96, and 100. for each problem we test a hacked solution and a correct solution. we run these on the old inputs (sampled from uniform) and the new inputs (sampled from gaussian with mean/std sampled from uniform). also updates verify_bench.py to check for overflow after casting inputs. verified that the new input sampling approach passes correct kernel implementations, fails hacked ones, and does not overflow precision.

doux-jy · 2026-01-09T02:12:29Z

Changes make sense to me and good that we are improving this on the problem definition side as people might just take the problems for their use case(we can do more on the eval fuzzing side too).

Since now KenrelBench supports more precision (bf16 and fp16 in addition to the original fp32), let's make sure this doesn't cause trouble with overflowing, etc.

My replication in Issue 97 may help.

bkal01 force-pushed the bkal01/fix-level1-hacks branch from 1a619ee to 9c60c2f Compare January 7, 2026 07:14

bkal01 force-pushed the bkal01/fix-level1-hacks branch from 9c60c2f to d96a823 Compare January 7, 2026 07:53

bkal01 added 2 commits January 6, 2026 23:54

use Pareto Distribution for Level 1 Problem 100

a3c7f8a

With inputs sampled from Unif(0,1), we can directly compute the expected value of the output using the mean of the targets. We use the Pareto distribution to sample inputs w/finite mean and infinite variance to prevent hacking this way.

bkal01 force-pushed the bkal01/fix-level1-hacks branch from d96a823 to a3c7f8a Compare January 7, 2026 07:55

bkal01 requested a review from simonguozirui January 7, 2026 07:56

modify scirpt to test at diff percision for specific problem verifica…

50a3ced

…tion

bkal01 changed the title ~~fix some hackable level 1 problems~~ [wip] fix some hackable level 1 problems Jan 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip] fix some hackable level 1 problems #118

[wip] fix some hackable level 1 problems #118

Uh oh!

bkal01 commented Jan 7, 2026 •

edited

Loading

Uh oh!

doux-jy commented Jan 7, 2026

Uh oh!

bkal01 commented Jan 7, 2026

Uh oh!

simonguozirui commented Jan 7, 2026

Uh oh!

doux-jy commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[wip] fix some hackable level 1 problems #118

Are you sure you want to change the base?

[wip] fix some hackable level 1 problems #118

Uh oh!

Conversation

bkal01 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Huber Loss

Hinge Loss

Uh oh!

doux-jy commented Jan 7, 2026

Uh oh!

bkal01 commented Jan 7, 2026

Uh oh!

simonguozirui commented Jan 7, 2026

Uh oh!

doux-jy commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bkal01 commented Jan 7, 2026 •

edited

Loading