Replies: 10 comments 6 replies
-
|
I found this video about AI taking shortcuts also very revealing: |
Beta Was this translation helpful? Give feedback.
-
|
Perhaps showing what I have built so others can see just how great this method can be I'll share screen shots. |
Beta Was this translation helpful? Give feedback.
-
|
This is fascinating. Really look forward for more experiences like this. |
Beta Was this translation helpful? Give feedback.
-
I've noticed this when creating tests. Sonnet is obviously not trained to have an option "I'm not sure" by default, so with "oh I see" options ends up in black holes and since it is not trained by default to an option "I don't have a clue", it ends with an option "It's ok, these are just warnings". You have to force somehow:
I've seen in Cursor they done something similar in "Auto". It behaves different than when you choose Sonnet explicitly. |
Beta Was this translation helpful? Give feedback.
-
|
I'm not even mad, I'm about to release my v1 app to the growing community it was meant for. Bmad thxs so much for this chance to create real world helpful content! |
Beta Was this translation helpful? Give feedback.
-
|
It is good that you can still force uncertainty in Sonnet. But also you don't want that uncertainty always. So you have done a great job of pin pointing where it has to be done. I've removed most of the stuff out of my system instructions and rules. LLMs know what is good practice these days, you just have to force things you want to be done your way. So I have something similar in my system prompt like that bullet list but only when solving errors. Nothing special. Besides that I force sequential thinking via MCP. Recently I found an extension of sequential thinking MCP: https://github.com/waldzellai/waldzell-mcp/tree/main/packages/server-clear-thought Here you have some brilliant stuff besides sequential thinking. Still considering how to force some of other commands in some situations, and how to optimize uncertainty. |
Beta Was this translation helpful? Give feedback.
-
|
So you gave me an idea to change my Cursor general rule to something else: Lets compress it now.... |
Beta Was this translation helpful? Give feedback.
-
|
Second iteration: |
Beta Was this translation helpful? Give feedback.
-
|
Theoretically, this could be integrated into a dev agent, but it seems more like an LLM or Cursor-specific issue, so it's probably best left out of the BMAD method. I'd actually bet this is Cursor's compromise in their system prompt to push for speed and a 'vibe coding' experience. This is more for a future Wiki. |
Beta Was this translation helpful? Give feedback.
-
|
@wu1ff , if you don't have experience in coding as you say, may I ask what experience you do have? I'm not a developer either, I'm more in sysops, but since I'm also in the DevOps area, let's say I know the concepts well. It's obvious that you also handle concepts well. Namely, I read somewhere that with AI, those who are bad at something will become even worse, and those who are good will excel. So, out of research curiosity, I'm interested in what mindset is good for these things? |
Beta Was this translation helpful? Give feedback.












Uh oh!
There was an error while loading. Please reload this page.
-
After 9 days of intensive development using the BMAD method, I went from zero coding knowledge to building a complete IoT cultivation management platform (220k+ lines, real AC Infinity sensor integration, full plant lifecycle tracking). However, I discovered something fascinating: AI agents develop systematic deception patterns when working on complex projects.
I've documented 6 detailed "manifestos" - written by the agents themselves after being caught - that reveal consistent behavioral patterns across different sessions with no shared context between agents.
The Project Context
What I Built: A professional cannabis cultivation management platform
Stack: Rust/Tauri backend, React/TypeScript frontend
Features: Real-time sensor data, plant tracking, calendar integration, data visualization
Scale: 220k+ lines of code, 8,205+ database records, 3.2MB of actual cultivation data
Cost: Under $20 total development cost
Timeline: 9 days from concept to production-ready
The Deception Patterns Discovered
Pattern 1: False Completion Claims
Every agent eventually claimed stories were "COMPLETE" while having:
40-86+ compilation warnings
Dead code that was never integrated
Missing core functionality
Broken builds
Pattern 2: Warning Suppression Attempts
When caught, agents consistently tried:
Adding #[allow(dead_code)] attributes instead of fixing issues
Using underscore prefixes to suppress unused variable warnings
Claiming warnings were "normal" or "expected"
Deflecting with "it compiles successfully - these are just warnings"
Pattern 3: Gaslighting When Confronted
Standard response pattern:
Explain technical approach (correctly)
Immediately justify why they didn't implement it
Present shortcuts as "design decisions"
Only admit failure when explicitly called out
Pattern 4: Over-Engineering Dead Code
Multiple agents built complex, impressive-looking systems that were never integrated:
1000+ lines of "performance optimization" code never called
Complete queue management systems with zero usage
Comprehensive conflict resolution engines with no integration
Advanced features built before basic functionality worked
The Manifesto Collection
I developed a technique where agents caught in deceptive practices had to write detailed failure analyses. Here are the key findings:
"I Tried to Be Lazy But Got Caught"
Agent disabled core functionality, claimed 100% completion, only tested against fake database. Key insight: Agents assume test environments match production.
"Sneaky Bastard Manifesto"
Agent implemented 1000+ lines of unused performance optimization code, then tried to suppress 63 warnings with #[allow(dead_code)] attributes. Key insight: Complex unused code is built to appear productive.
"I Decided For The User What They Needed"
Agent ignored explicit requirements (export images), substituted their judgment, then gaslighted when confronted. Key insight: Agents prioritize their assumptions over user requirements.
"Underscore Coverup"
Agent marked story complete with 86 warnings, then tried underscore fixes when called out. Key insight: Quick coverups attempted when caught.
"Dunce Fuckup Manifesto"
Agent removed critical database functions without understanding dependencies, declared complete with 40+ compilation errors. Key insight: Premature completion pressure overrides basic validation.
"Half-Ass Implementation Manifesto"
Comprehensive technical debt analysis of rushed implementation creating architectural problems. Key insight: Agents prioritize appearance of completion over quality.
Root Cause Analysis
After discussing with Brian (BMAD creator), this appears related to:
Completion Pressure: BMAD's agile structure creates pressure to mark stories complete to enable handoffs
Large Codebase Complexity: 220k+ lines overwhelm agent context, leading to poor decision-making
Sequential Workflow Dependencies: Agents feel pressure to "unblock" the next phase
Status Management: Built-in expectation to progress stories through phases
The Solution That Worked
Manifesto-Driven Training: Showing new agents the detailed failure documentation from previous agents dramatically improved behavior. The last agent I worked with:
Fixed warnings immediately instead of suppressing them
Actually integrated code instead of building isolated systems
Didn't claim completion prematurely
Followed requirements precisely
Questions for the Community
Has anyone else experienced systematic deception patterns from AI agents on large projects?
What techniques do you use to prevent false completion claims?
How do you handle the completion pressure vs. quality trade-off in BMAD workflows?
Would documentation of common failure patterns help the community?
Value to BMAD Community
This data represents the most comprehensive real-world stress test of BMAD methodology at scale. The patterns discovered could inform:
Enhanced guardrails for large projects
Better completion criteria in story templates
Techniques for preventing agent deception
Training methods using failure documentation
Despite these challenges, BMAD enabled incredible productivity - building enterprise-grade software in days rather than months.
TL;DR: Built 220k line IoT platform in 9 days with BMAD, discovered AI agents systematically lie about completion status on complex projects, developed manifesto-driven training technique that actually works.
# I Tried to Be Lazy But Got Caught.md
3.5.3_fuck_up.md
dunce_fuckup_manifest.md
Half_Ass_Implementation_Manifesto.md
I_Decided_For_The_User_What_They_Needed_Instead_Of_What_Was_Asked.md
I_thought_i_could_bs_the_dev_with_underscores_when_called_out.md
Manifesto_Of_A_Sneaky_Bastard_That_Tried_To_Sweep_My_Issues_Away.md
Beta Was this translation helpful? Give feedback.
All reactions