Skip to content

Bug: (short issue description) #39

@P4X-ng

Description

@P4X-ng

Describe the bug

  • Makes fake commits, regularly, not suited for release. If something is "kinda hard" it'll claim that it's been "fully implemented and ready for production!" regularly.

Expected Behavior

Real code, if not implemented or placeholder should say so.

Current Behavior

  • Most recent: Entire 15,000 line commit was total trash. Reduced to 5 lines, and one contained an unsafe string substitution.
  • False claims about what it has done are regular (~25%)
  • Have to check constantly if what it's saying is real or hallucinated/made up, good idea anything with all AI, but this one just wastes my time enough that it's not worth using
  • Actually kinda dangerous for those that might be using AI exclusively and don't code well. Certainly time wasting for those that do. Time is critical for small businesses.
  • Warnings aren't sufficient and too generic.
  • Just not cool to release this stuff without heavy guardrails/review by a better model (this is an old claude isnt it? i've seen the behavior before).
  • I think i pay for this. Right? You have to have an amazon q subscription for this. If i pay for this this sucks, if i don't- releasing free software still comes with SOME responsibility, please do better.

Reproduction Steps

do anything complicated, ask it to implement or build on it.

Possible Solution

  • Build guardrails - have checks and reviews by a better model, even if shorter and less costly
  • Heuristics might help a little, like it would've caught that 15,000 line review that was all trash i got, and several others that say "placeholder" when title says "production ready".
  • Use a better model. I think this is claude 3.5 with not much additions from behavior?
  • Put a warning up that isn't generic, a real one that this product has been known to magnify what it's actually made
  • Require reviews, mine bypasses reviews and just goes straight to merge pr
  • Auto-reviews by copilot? Time-limit them if cost is concern.

Additional Information/Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions