A Real-World Test: Can we use "Vibe Coding" (describing what we want in plain English to AI) to build a "Vibe Cleaner" (a tool that organizes messy Downloads folders through natural language)?
My Experience: Mixed results. This repository represents both the promise and current limitations of agentic AI workflows.
VibeCleaner is an experimental AI-powered tool that attempts to understand natural language requests to organize your Downloads folder. It uses Claude or Codex AI to interpret your intent and execute file organization tasks.
Important Context: This is my first public repository, born from testing whether AI agents can handle real-world file management. The results revealed significant challenges that are documented here for transparency and collaborative improvement.
VibeCleaner transforms your chaotic Downloads folder into an organized file system by:
- Auto-Organization: Sorts files into categorized folders (Documents, Images, Videos, Archives, etc.)
- Duplicate Detection: Identifies and removes duplicate files to save disk space
- Old File Management: Archives or deletes files older than specified periods
- Smart Filtering: Recognizes file patterns and sorts based on customizable rules
- Safe Operations: Dry-run mode shows what will happen before making changes
-
π€ AI-Powered Natural Language Interface:
- Talk to VibeCleaner like a human - no commands to memorize
- Understands requests like "organize my downloads" or "delete old stuff"
- Works with Claude or Codex AI assistants
- Interactive chat mode for conversational cleaning
- Smart fallback to rule-based cleaning if AI unavailable
-
File Type Recognition: Automatically categorizes files by extension and content
- Documents (PDF, DOCX, TXT, etc.)
- Images (JPG, PNG, GIF, etc.)
- Videos (MP4, AVI, MKV, etc.)
- Archives (ZIP, RAR, 7Z, etc.)
- Code files (PY, JS, HTML, etc.)
- Audio (MP3, WAV, FLAC, etc.)
-
Intelligent Rules Engine:
- Move files based on age
- Sort by file size thresholds
- Custom naming patterns
- Subfolder organization by date
-
Safety Features:
- Preview mode to see changes before execution
- Undo functionality for recent operations
- Whitelist important files
- Backup before deletion option
-
Scheduling Options:
- Run on system startup
- Periodic cleaning (hourly, daily, weekly)
- Watch folder for real-time organization
- Manual trigger via CLI or GUI
- Daily Cleanup: Automatically organize downloads every day
- Project Files: Sort project files into dedicated folders
- Media Management: Organize photos and videos by date
- Document Filing: Keep documents sorted by type and date
- Disk Space Recovery: Remove old and duplicate files
- Download History: Maintain organized archive of downloads
Through testing, I discovered significant challenges with current AI agents:
- Large Folder Failure: Both Claude and Codex lose context with too many files
- Attention Drift: AI agents "forget" instructions during complex operations
- Inconsistent Results: Same prompt, different outcomes
- Sandbox Limitations: Codex has severe restrictions that limit real-world usage
- Context Window Issues: Can't handle folders with 1000+ files effectively
- Claude: Excellent at generating code, struggled with file management logic
- Codex: Good on small folders despite sandbox limitations
- Fallback Rules: Had to implement non-AI backup logic for reliability
This repository packages the working scripts generated by AI with:
- Task decomposition into smaller passes
- Folder learning systems
- OCR integration for document classification
- Extensive guardrails and safety checks
- Manual fallback for when AI fails
- Started with: Smaller test folder (~100 files)
- Result: AI performed reasonably well
- Organization: Clean categorization achieved
- Initial count: ~2,800 files
- Archive files: Multiple large ZIPs and RARs
- The explosion: After extracting archives, file count multiplied
- Current state: 579 files (after partial cleanup attempts)
I asked AI to implement OCR β Interpretation β Smart Renaming:
- Goal: Read document content, understand context, rename intelligently
- Reality: Complete chaos
- Misinterpreted documents
- Created nonsensical names
- Lost original file references
- Duplicate naming conflicts
- System nearly unusable
- Starting File Count: 579 files
- Organization Status: 12 manually created subdirectories
- Unprocessed Archives: 13 ZIP files still pending
- File Type Diversity: 50+ different extensions
- AI Performance: Handled well due to manageable scale
- Result: Success - AI could maintain context and follow instructions effectively
- Starting File Count: 2,800 files (manageable scale initially)
- The Fatal Request: "Unzip downloads" - asked AI to extract archive files
- Archive Explosion: ZIP extractions caused massive file count explosion to 15,323 files
- Context Loss Crisis: AI lost ability to group related files that came from same archives
- Folder Meaning Collapse:
- AI couldn't understand that files extracted from "Project_Alpha.zip" belonged together
- Lost semantic relationships between related documents
- No experience handling the concept that folder structure inside archives has meaning
- AI Renaming Catastrophe Evidence:
- Version number chaos:
Company_A_2024-12-26_v1_v2_v3_[...through v36] (9).pdf
- Automated prefixes: Files with
20250824_AGR_
,20250824_PRES_
,20250824_TECH_
showing failed categorization - Lost all original archive context and semantic groupings
- Version number chaos:
- Scale Impact: 5.4x file explosion (2,800 β 15,323) broke AI's processing capabilities
- Result: Complete organizational breakdown - files became unfindable, semantic relationships destroyed
-
Complex File Naming Patterns
- Files like "Complete_with_DocuSign_*" (multiple variations)
- Version numbers: "Report (1).xlsx" through "Report (6).xlsx"
- Date formats: Multiple inconsistent date patterns
- Similar names: "2021 Tax Return Person_A.pdf" vs "2021 Tax Return Person_A1.pdf"
-
Context-Dependent Organization
- Same file type belongs in different folders based on content
- PDFs could be: Tax docs, Contracts, Reports, or Insurance
- Excel files could be: Financial, Analytics, or Technical
- AI couldn't determine context without reading file content
-
Business Logic Complexity
- Corporate files require domain knowledge
- Legal vs Financial vs HR distinctions
- Temporal relevance (2021 vs 2022 tax files)
- Relationship between files (invoice + receipt pairs)
-
Scale Explosion Issues
- Started manageable (100 files) β Success
- Scaled to 2,800 files β Performance degraded
- Archive extraction β Exponential growth
- Context window shattered at ~100 files
- AI completely lost at 500+ files
- Each iteration made organization worse
- OCR attempt created irreversible chaos
-
The Compounding Failure Pattern
- Small folder success gave false confidence
- Scaled up to main Downloads β partial failure
- Added complexity (OCR) β complete breakdown
- Each "smart" feature multiplied failure modes
- Recovery required manual intervention
- Small Batches: Processing 20-30 files at a time worked
- Clear Extensions: .jpg, .png consistently categorized
- Simple Rules: Date-based archival was reliable
- Duplicate Detection: Hash-based matching worked well
I gave two different folders to Codex and Claude, then made them learn from each other's approaches. The evolution of their strategies revealed fundamental differences in how AI agents approach problems.
- Machine Learning Focus: Proposed training classification models on file patterns
- Learning Gates: Suggested implementing attention gates to maintain context
- Pattern Recognition: Wanted to build neural networks for file type detection
- Embedding Systems: Proposed vector embeddings for semantic file similarity
- Reinforcement Learning: Suggested reward systems for correct categorization
- Overly Complex: Often over-engineered solutions for simple problems
- Rule-Based Systems: Started with deterministic rules before adding AI
- Hierarchical Organization: Focused on folder tree structures
- Metadata Extraction: Emphasized using file properties and timestamps
- Batch Processing: Suggested chunking strategies for large folders
- Fallback Mechanisms: Always included non-AI backup plans
- Pragmatic: Preferred simple solutions that worked reliably
When I shared Codex's ideas with Claude and vice versa:
- Claude + Codex Ideas: Claude simplified Codex's ML proposals into practical heuristics
- Codex + Claude Ideas: Codex tried to mathematically formalize Claude's rules
- Convergence Issues: They rarely agreed on implementation details
- Different Priorities: Codex optimized for accuracy, Claude for reliability
The two AIs had fundamentally different problem-solving philosophies:
- Codex: "Let's build a system that learns"
- Claude: "Let's build a system that works"
This divergence highlights a core challenge in vibe coding: different AI agents interpret the same request through vastly different lenses, making consistent results nearly impossible.
Test Environment 1 (OneDrive Workspace - 579 files):
- Success at Small Scale: AI handled 579 files effectively
- Maintained Context: Could follow instructions and maintain organization
- Result: Usable system with 12 organized subdirectories
Test Environment 2 (Main Downloads - The Archive Explosion):
- Started Manageable: 2,800 files - initially within AI's capabilities
- Fatal Request: "Unzip downloads" - asked AI to extract archives
- Archive Explosion: File count exploded 5.4x (2,800 β 15,323 files)
- Context Collapse: AI lost understanding of semantic file relationships from archives
- Total System Failure: Files became unfindable, all semantic groupings destroyed
- Small Success: 100 files organized perfectly β False confidence
- Medium Attempt: 500-2,800 files partially worked β Pushed forward
- Feature Creep: Added OCR "smartness" β Amplified existing problems
- Scale Disaster: Large folder (15,323 files) β Complete system breakdown
When I asked AI to "make it smarter" with OCR:
- Request: "Read PDFs, understand content, rename intelligently"
- AI Response: Enthusiastically implemented OCR + NLP + Renaming
- Catastrophic Results:
- "Invoice_2023.pdf" β "document_about_financial_transactions_possibly_related_to_business.pdf"
- "Contract_Draft.pdf" β "legal_text_with_multiple_parties_mentioned.pdf"
- Version explosion:
Company_A_2024-12-26_v1_v2_v3_[...through v36] (9).pdf
- Automated categorization gone wrong: Random
20250824_AGR_
,20250824_PRES_
,20250824_TECH_
prefixes - Lost all original context and meaningful file names
- Created hundreds of naming conflicts
- Generated massive empty folder structures that AI couldn't clean up
- Files became completely unfindable in the 15,323-file chaos
The Vibe Coding Trap: AI's enthusiasm to help can destroy your data. Every "smart" feature is a potential catastrophe at scale. The same AI that brilliantly organizes 100 files will confidently destroy 15,323 files with equal enthusiasm, creating chaos that's nearly impossible to recover from manually.
Token Scale Mismatch: AI agents are fundamentally designed for code and limited text processing. Documents and OCR content require millions of tokens at scale - far beyond what current AI agents can efficiently handle. They resist or fail catastrophically when pushed beyond their optimal token ranges.
Instruction Following Breakdown: The massive failure wasn't just about scale - it revealed severe job fit issues. AI agents optimized for code understanding showed tremendous weakness in instruction following for document organization tasks. The tooling and specialization gap became apparent.
Specialization Requirements: This experiment proves that AI agents require:
- Proper Task Specialization: File organization needs different AI architectures than code analysis
- Well-Crafted Tooling: Generic AI agents need specialized tools and constraints for domain-specific tasks
- Token-Aware Design: Systems must account for the token limitations and processing capabilities of the underlying AI
- Archive Context Awareness: AI agents lack understanding that files from the same archive belong together semantically
The Archive Problem: AI agents have no concept that when you extract "ProjectAlpha.zip", all the resulting files maintain semantic relationships. They treat each extracted file as independent, destroying the original organizational intent embedded in the archive structure.
The Deeper Lesson: Vibe coding works when the AI agent's design matches the task domain. Using code-optimized agents for document management is like using a text editor to edit video - technically possible, spectacularly ineffective. The archive explosion disaster reveals that AI agents lack fundamental concepts about file relationships that humans take for granted.
- AI Provider: Install Claude CLI or Codex CLI for AI features
- Claude:
pip install claude-cli
(or your Claude installation method) - Codex:
npm install -g codex-cli
(or your Codex installation method) - Set environment variables if needed:
VIBECLEANER_CLAUDE_CMD=claude
VIBECLEANER_CODEX_CMD=codex
- Claude:
# Install from PyPI
pip install vibecleaner
# Or install from source
git clone https://github.com/yourusername/vibecleaner.git
cd vibecleaner
pip install .
# Initialize VibeCleaner
vibecleaner init
# Ask in natural language
vibecleaner ask "clean up my messy downloads folder"
vibecleaner ask "find and delete duplicate photos"
vibecleaner ask "organize PDFs from last month"
vibecleaner ask "what files are taking up the most space?"
# Interactive chat mode
vibecleaner chat
# Then chat naturally:
# You: my downloads are a mess, help!
# Assistant: I'll help you organize...
# Apply AI suggestions automatically
vibecleaner ask "remove old files" --apply
# Preview what will be cleaned (dry run)
vibecleaner clean --dry-run
# Clean downloads folder
vibecleaner clean
# Clean with specific rules
vibecleaner clean --older-than 30 --duplicates
# Watch folder for real-time organization
vibecleaner watch ~/Downloads
# Schedule automatic cleaning
vibecleaner schedule --daily --time 09:00
Create a .vibecleaner.yml
in your home directory:
# Download folder path (default: ~/Downloads)
downloads_path: ~/Downloads
# Organization rules
organize:
Documents:
extensions: [pdf, doc, docx, txt, odt]
path: ~/Documents/Downloads
Images:
extensions: [jpg, jpeg, png, gif, svg, webp]
path: ~/Pictures/Downloads
Videos:
extensions: [mp4, avi, mkv, mov, wmv]
path: ~/Videos/Downloads
Archives:
extensions: [zip, rar, 7z, tar, gz]
path: ~/Downloads/Archives
# Cleanup rules
cleanup:
delete_after_days: 90
archive_after_days: 30
min_file_size: 1MB # Ignore small files
remove_duplicates: true
# Safety settings
safety:
dry_run_default: true
backup_before_delete: true
whitelist_patterns:
- "important_*"
- "*.key"
- "*.license"
# Add custom organization rule
vibecleaner rule add --name "Screenshots" \
--pattern "Screen Shot*" \
--destination ~/Pictures/Screenshots
# Remove old downloads
vibecleaner clean --older-than 60d --min-size 100MB
# Find and remove duplicates
vibecleaner duplicates --remove --keep-newest
# Linux/Mac: Add to crontab
vibecleaner schedule --cron "0 9 * * *"
# Windows: Add to Task Scheduler
vibecleaner schedule --windows --daily --time 09:00
# Run as daemon/service
vibecleaner daemon start
# Show cleaning statistics
vibecleaner stats
# Generate cleaning report
vibecleaner report --last-30-days
# Show disk space saved
vibecleaner savings
Command | Description |
---|---|
vibecleaner init |
Initialize configuration |
vibecleaner clean |
Clean downloads folder |
vibecleaner watch |
Watch folder for real-time organization |
vibecleaner schedule |
Set up automatic cleaning |
vibecleaner undo |
Undo last cleaning operation |
vibecleaner stats |
Show cleaning statistics |
vibecleaner config |
Edit configuration |
- No Cloud Dependency: Works entirely offline
- Local Processing: Your files never leave your machine
- Transparent Operations: Full logs of all actions
- Reversible Changes: Undo support for recent operations
- No Data Collection: Zero telemetry or analytics
Licensed under the MIT License. See the LICENSE
file for details.
This project needs your help! The agentic workflow shows promise but requires significant improvement.
- Context Management: Better handling of large folders (1000+ files)
- Attention Systems: Prevent AI from losing track during long operations
- Prompt Engineering: More robust prompts that work consistently
- Edge Cases: Handle unusual file types and folder structures
- Provider Integration: Better Claude/Codex integration, add more providers
- Learning Systems: Implement folder pattern recognition
- Testing: Test on diverse, real-world messy folders
- Fork the repository
- Test on your own messy Downloads folder
- Document what breaks
- Submit fixes or improvements
- Share your experience with agentic workflows
Special Interest: If you've successfully implemented attention management or context preservation in agentic systems, your expertise would be invaluable.
See CONTRIBUTING.md
for technical guidelines.
- Deeper analysis and design recommendations:
docs/REFLECTIONS.md
- arXiv-style writeup of the experiment:
docs/arxiv-draft.md
-
Agents Need Boundaries: Unlimited scope leads to failure. Break tasks into small, well-defined chunks.
-
Context is Everything: Current AI loses context quickly. Solution: Implement checkpoint systems.
-
Trust but Verify: Always preview before execution. AI confidence doesn't equal correctness.
-
Fallbacks are Essential: When AI fails (and it will), have deterministic rules ready.
-
Prompt Management is Complex: What works once might fail next time. Version and test your prompts.
The Promise: Describe what you want, AI handles implementation.
The Reality: You need extensive infrastructure to make it reliable:
- Attention management
- Context preservation
- Task decomposition
- Error recovery
- Safety guardrails
The Deeper Challenge: Different AI providers think fundamentally differently:
- Same prompt β Completely different architectures
- No consensus on basic approaches
- Cross-provider learning often makes things worse
- Each AI's biases compound when combined
What This Means: The final VibeCleaner implementation is a hybrid of:
- Codex's classification ideas (simplified to basic rules)
- Claude's pragmatic file handling (with safety checks)
- Manual fallbacks when both approaches fail
- Cherry-picked working code from dozens of failed attempts
The Future: With community effort, we can build the missing infrastructure to make vibe coding practical. But we need to acknowledge that different AI agents may never fully align on implementation strategies.
- Start Small: Test on folders with <100 files first
- Always Preview: Use
--dry-run
before any operation - Backup First: AI can misunderstand dramatically
- Watch Mode: Better for real-time organization than bulk cleaning
- Manual Fallback: When AI fails, use traditional commands
Important Note: This tool is experimental. VibeCleaner represents both the potential and current limitations of agentic AI workflows. Use with caution on important files. Always maintain backups.