We Accidentally Rewarded AI Spaghetti Code. Here is the Math We Used to Fix It. : AI-Slop Detector v2.8.0 #36
flamehaven01
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'll be honest: I wasn't sure if I'd write this post.
Not because there isn't enough to say about v2.8.0 — there's plenty. But because every time I say "this release is the big one," I feel a little self-conscious about it.
So I'll just say this plainly: if your team is writing code with AI assistance, I genuinely think this tool is worth your time. Not because I built it, but because the problem it solves is real, and v2.8.0 finally solves it the right way.
Here's the full story.
1. The Problem We Keep Running Into
If you've been using AI coding assistants long enough, you know the smell of "AI Slop."
Deeply nested functions that do nothing. Unused imports of
tensorflowjust to look like an ML script. Docstrings packed with "state-of-the-art scalable synergistic microservices" — for a function that literally just returnsNone.We built AI-Slop-Detector to hunt down and block these patterns automatically in CI/CD pipelines. But as AI models grew more capable, we found they were also getting better at hiding their slop. And worse — our own math was helping them do it.
2. 🛑 The Critical Flaw We Had to Fix
In earlier versions, the "Inflation-to-Code Ratio" (ICR) was supposed to penalize files that used too many buzzwords relative to their actual logic. The old formula looked like this:
Notice the problem?
Because
avg_complexitysat in the denominator, a massive, unreadable "God Function" with a cyclomatic complexity of 30 would reduce its own jargon penalty. We were mathematically rewarding AI for writing longer, more convoluted spaghetti code — because complexity was diluting the score.That's not a minor bug. That's the engine working backwards.
3. 📐 The v2.8.0 Fix: Complexity as an Amplifier
In v2.8.0, we inverted the logic entirely. Complexity no longer dilutes the penalty — it multiplies it.
A function with complexity 13 now receives double the penalty for the same jargon as a simple function. The message to AI-generated code is clear:
4. ⚔️ Live Demo: Feeding It the Worst Code I Could Write
To validate the new models, I crafted the most egregious piece of AI Slop I could. Unused heavy imports, five levels of nesting, a bare
except, a mutable default argument, and a docstring that reads like a startup pitch deck.The victim (
slop_test_sample.py):Running the detector:
Result: 100.0/100 CRITICAL_DEFICIT.
Let's break down exactly what it caught.
1) 📦 Fake Dependencies
tensorflowwas imported to look like an ML script. It was never called. The detector doesn't just count unused imports — it specifically flags known heavyweight AI/ML libraries as "Fake Imports" when they're never invoked.2) 🌳 AST-Based Deep Nesting Detection
This is new in v2.8.0. Instead of scanning text, the engine walks the actual Abstract Syntax Tree —
try → for → if → while → if— and counts the cognitive depth directly. No regex tricks.3) 🤥 Calling Out the Claims
This is the part I'm most proud of. The engine reads the docstring, extracts architectural claims like "scalable" or "enterprise-grade," then cross-references the AST to check for actual evidence — connection pooling, caching, logging, proper error handling. When those structures are absent, it generates specific review questions automatically.
4) 🚨 Structural Anti-Patterns
Classic patterns. The bare
exceptsilently swallows crashes. Theitems=[]default argument is a shared-state bug that surprises developers for decades. Both caught immediately.5. 🔮 Other Major Changes in v2.8.0
SR9 Project Aggregation
Project-level scoring is no longer a simple mean. The new formula is:
This is our conservative SR9 Aggregation. In a project with 99 perfect files and 1 absolute garbage file, a simple average says everything is fine. SR9 drags the score down to expose the weakest link — because in production, the weakest link is all that matters.
Function-Scoped Justification
Previously, importing
torchat the top of any file was a "free pass" to use ML jargon everywhere. Now, jargon is only justified when the relevant import or decorator is within the same function's scope in the AST.Optional ML Secondary Signal
A 16-dimensional feature vector (RandomForest/XGBoost) is now available as a secondary validation layer, fully sandboxed from the zero-dependency core so it doesn't slow down CI pipelines that don't need it.
188 Tests. Zero regressions.
6. A Genuine Recommendation
I try not to oversell things I've built. But I'll say this:
If your team uses Copilot, Cursor, Claude, or any AI assistant to generate code — and you're not reviewing the structural and semantic quality of that output automatically — you are accumulating debt that is harder to see than normal technical debt, because it looks like production code.
This tool won't catch everything. But it will catch the patterns that slip through even careful human review, the ones that only show up when you interrogate the AST directly.
If you try it and find something it misses — or catches incorrectly — I genuinely want to hear about it. That's how v2.8.0 got built.
8. Repository & Documentation
📦 VS Code Extension
Install directly from the marketplace:
https://marketplace.visualstudio.com/items?itemName=flamehaven.vscode-slop-detector
{% github flamehaven01/AI-SLOP-Detector %}
The tool is open for feedback—I'm actively iterating based on real-world usage.
What's the worst piece of AI-generated code you've seen slip into a real codebase? Drop it in the comments — I might add it to the test suite.
Beta Was this translation helpful? Give feedback.
All reactions