Skip to content

fix: raise JS/TS noise floor to reduce false positive speedups#1820

Open
mohammedahmed18 wants to merge 1 commit intomainfrom
fix/js-higher-noise-floor
Open

fix: raise JS/TS noise floor to reduce false positive speedups#1820
mohammedahmed18 wants to merge 1 commit intomainfrom
fix/js-higher-noise-floor

Conversation

@mohammedahmed18
Copy link
Contributor

Summary

  • Adds a 3x noise multiplier for JavaScript/TypeScript in the speedup critic, raising the acceptance threshold from 5% to 15% (and from 15% to 45% for fast functions <10μs)
  • Extracts duplicated noise floor logic into a shared compute_noise_floor() helper used by both speedup_critic() and get_acceptance_reason()
  • Adds tests for the new JS noise floor behavior and verifies Python thresholds are unchanged

Motivation

Analysis of 73 JS optimization PRs on a Strapi project revealed widespread false positives. Separate V8 processes have 15%+ JIT/GC variance between runs, but the acceptance threshold was only 5% — meaning process-level noise routinely exceeded the bar. For example, PR #40 claimed 10.6% speedup for a purely syntactic change (removing { return } braces), while functionally identical candidates showed a 16% swing.

Language Runtime ≥ 10μs Runtime < 10μs
Python 5% 15%
JS/TS 15% 45%

Test plan

  • All 14 critic tests pass (11 existing + 4 new)
  • New tests verify: Python thresholds unchanged, JS/TS get 3x multiplier, TypeScript treated same as JS
  • Integration test: 10.5% speedup accepted for Python but rejected for JS

🤖 Generated with Claude Code

…eedups

Separate V8 processes have significant JIT/GC variance (15%+) that causes
false positive speedups at the current 5% threshold. This raises the JS/TS
noise floor to 15% (45% for <10μs functions) via a 3x multiplier.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Mar 12, 2026

Claude finished @mohammedahmed18's task in 2m 47s —— View job


PR Review Summary

  • Triage: assess PR scope
  • Lint and typecheck changed files
  • Resolve stale review threads
  • Code review
  • Duplicate detection
  • Test coverage analysis

Prek Checks

All ruff checks and formatting pass cleanly. No issues to fix.

mypy reports one error (Module "codeflash.models.models" does not explicitly export attribute "TestType" in tests/test_critic.py:8), but this is a pre-existing issue unrelated to this PR — the import was already present before these changes.


Code Review

The PR is correct and well-structured. One minor issue found:

Stale docstring in speedup_critic (codeflash/result/critic.py:99-101)

The docstring still says:

"Currently, the noise floor is 2xMIN_IMPROVEMENT_THRESHOLD when the original runtime is less than 10 microseconds"

It should say 3x (this was the pre-existing value) and should now also mention the JS/TS multiplier. The compute_noise_floor() docstring is accurate, but this caller's docstring is now stale. Low severity — doesn't affect behavior. Fix this →

Everything else looks good:

  • The compute_noise_floor() extraction correctly preserves all existing behavior for Python — verified by the fact that the old noise_floor = noise_floor * 2 assignment became noise_floor *= 2 in the new helper.
  • get_acceptance_reason() calls compute_noise_floor() without disable_gh_action_noise, defaulting to False — this correctly maintains the previous behavior (CI noise doubling was always applied in that function).
  • is_javascript() correctly covers both Language.JAVASCRIPT and Language.TYPESCRIPT, so the TypeScript case is properly handled.

Duplicate Detection

No duplicates detected. The compute_noise_floor() helper is a clean extraction of duplicated logic within the same file (two callers: speedup_critic and get_acceptance_reason). No similar logic was found elsewhere in the codebase.


Test Coverage

All 14 tests pass (11 existing + 3 new noise floor tests + 1 integration test).

codeflash/result/critic.py: 74% coverage (from critic tests alone).

  • Missing lines: 41 (the CI-noise doubling branch — expected, tests use disable_gh_action_noise=True), 170-203 (get_acceptance_reason body — pre-existing gap), 224 (pre-existing).
  • The new compute_noise_floor() function is well-covered by the new tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants