Skip to content

Declutter your Downloads folder - auto-sort files into folders, delete old files, and keep your system clean

License

Notifications You must be signed in to change notification settings

VivekLmd/VibeCleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

VibeCleaner β€” AI-Powered Downloads Folder Organizer

The Vibe Coding Experiment

A Real-World Test: Can we use "Vibe Coding" (describing what we want in plain English to AI) to build a "Vibe Cleaner" (a tool that organizes messy Downloads folders through natural language)?

My Experience: Mixed results. This repository represents both the promise and current limitations of agentic AI workflows.

About

VibeCleaner is an experimental AI-powered tool that attempts to understand natural language requests to organize your Downloads folder. It uses Claude or Codex AI to interpret your intent and execute file organization tasks.

Important Context: This is my first public repository, born from testing whether AI agents can handle real-world file management. The results revealed significant challenges that are documented here for transparency and collaborative improvement.

🎯 What VibeCleaner Does

VibeCleaner transforms your chaotic Downloads folder into an organized file system by:

  • Auto-Organization: Sorts files into categorized folders (Documents, Images, Videos, Archives, etc.)
  • Duplicate Detection: Identifies and removes duplicate files to save disk space
  • Old File Management: Archives or deletes files older than specified periods
  • Smart Filtering: Recognizes file patterns and sorts based on customizable rules
  • Safe Operations: Dry-run mode shows what will happen before making changes

✨ Key Features

  • πŸ€– AI-Powered Natural Language Interface:

    • Talk to VibeCleaner like a human - no commands to memorize
    • Understands requests like "organize my downloads" or "delete old stuff"
    • Works with Claude or Codex AI assistants
    • Interactive chat mode for conversational cleaning
    • Smart fallback to rule-based cleaning if AI unavailable
  • File Type Recognition: Automatically categorizes files by extension and content

    • Documents (PDF, DOCX, TXT, etc.)
    • Images (JPG, PNG, GIF, etc.)
    • Videos (MP4, AVI, MKV, etc.)
    • Archives (ZIP, RAR, 7Z, etc.)
    • Code files (PY, JS, HTML, etc.)
    • Audio (MP3, WAV, FLAC, etc.)
  • Intelligent Rules Engine:

    • Move files based on age
    • Sort by file size thresholds
    • Custom naming patterns
    • Subfolder organization by date
  • Safety Features:

    • Preview mode to see changes before execution
    • Undo functionality for recent operations
    • Whitelist important files
    • Backup before deletion option
  • Scheduling Options:

    • Run on system startup
    • Periodic cleaning (hourly, daily, weekly)
    • Watch folder for real-time organization
    • Manual trigger via CLI or GUI

πŸ“‹ Use Cases

  • Daily Cleanup: Automatically organize downloads every day
  • Project Files: Sort project files into dedicated folders
  • Media Management: Organize photos and videos by date
  • Document Filing: Keep documents sorted by type and date
  • Disk Space Recovery: Remove old and duplicate files
  • Download History: Maintain organized archive of downloads

⚠️ Reality Check: Limitations Discovered

Through testing, I discovered significant challenges with current AI agents:

What Went Wrong

  1. Large Folder Failure: Both Claude and Codex lose context with too many files
  2. Attention Drift: AI agents "forget" instructions during complex operations
  3. Inconsistent Results: Same prompt, different outcomes
  4. Sandbox Limitations: Codex has severe restrictions that limit real-world usage
  5. Context Window Issues: Can't handle folders with 1000+ files effectively

What Worked (Sometimes)

  • Claude: Excellent at generating code, struggled with file management logic
  • Codex: Good on small folders despite sandbox limitations
  • Fallback Rules: Had to implement non-AI backup logic for reliability

The Solution (Workarounds)

This repository packages the working scripts generated by AI with:

  • Task decomposition into smaller passes
  • Folder learning systems
  • OCR integration for document classification
  • Extensive guardrails and safety checks
  • Manual fallback for when AI fails

πŸ“Š Empirical Findings from Real Testing

The Escalating Complexity Problem

Phase 1: Initial Success (Small Folder)

  • Started with: Smaller test folder (~100 files)
  • Result: AI performed reasonably well
  • Organization: Clean categorization achieved

Phase 2: The Main Downloads Folder

  • Initial count: ~2,800 files
  • Archive files: Multiple large ZIPs and RARs
  • The explosion: After extracting archives, file count multiplied
  • Current state: 579 files (after partial cleanup attempts)

Phase 3: The OCR Disaster

I asked AI to implement OCR β†’ Interpretation β†’ Smart Renaming:

  • Goal: Read document content, understand context, rename intelligently
  • Reality: Complete chaos
    • Misinterpreted documents
    • Created nonsensical names
    • Lost original file references
    • Duplicate naming conflicts
    • System nearly unusable

Test Environment 1: OneDrive Workspace (Small Scale Success)

  • Starting File Count: 579 files
  • Organization Status: 12 manually created subdirectories
  • Unprocessed Archives: 13 ZIP files still pending
  • File Type Diversity: 50+ different extensions
  • AI Performance: Handled well due to manageable scale
  • Result: Success - AI could maintain context and follow instructions effectively

Test Environment 2: Main Downloads Folder (Archive Explosion Disaster)

  • Starting File Count: 2,800 files (manageable scale initially)
  • The Fatal Request: "Unzip downloads" - asked AI to extract archive files
  • Archive Explosion: ZIP extractions caused massive file count explosion to 15,323 files
  • Context Loss Crisis: AI lost ability to group related files that came from same archives
  • Folder Meaning Collapse:
    • AI couldn't understand that files extracted from "Project_Alpha.zip" belonged together
    • Lost semantic relationships between related documents
    • No experience handling the concept that folder structure inside archives has meaning
  • AI Renaming Catastrophe Evidence:
    • Version number chaos: Company_A_2024-12-26_v1_v2_v3_[...through v36] (9).pdf
    • Automated prefixes: Files with 20250824_AGR_, 20250824_PRES_, 20250824_TECH_ showing failed categorization
    • Lost all original archive context and semantic groupings
  • Scale Impact: 5.4x file explosion (2,800 β†’ 15,323) broke AI's processing capabilities
  • Result: Complete organizational breakdown - files became unfindable, semantic relationships destroyed

What AI Struggled With

  1. Complex File Naming Patterns

    • Files like "Complete_with_DocuSign_*" (multiple variations)
    • Version numbers: "Report (1).xlsx" through "Report (6).xlsx"
    • Date formats: Multiple inconsistent date patterns
    • Similar names: "2021 Tax Return Person_A.pdf" vs "2021 Tax Return Person_A1.pdf"
  2. Context-Dependent Organization

    • Same file type belongs in different folders based on content
    • PDFs could be: Tax docs, Contracts, Reports, or Insurance
    • Excel files could be: Financial, Analytics, or Technical
    • AI couldn't determine context without reading file content
  3. Business Logic Complexity

    • Corporate files require domain knowledge
    • Legal vs Financial vs HR distinctions
    • Temporal relevance (2021 vs 2022 tax files)
    • Relationship between files (invoice + receipt pairs)
  4. Scale Explosion Issues

    • Started manageable (100 files) β†’ Success
    • Scaled to 2,800 files β†’ Performance degraded
    • Archive extraction β†’ Exponential growth
    • Context window shattered at ~100 files
    • AI completely lost at 500+ files
    • Each iteration made organization worse
    • OCR attempt created irreversible chaos
  5. The Compounding Failure Pattern

    • Small folder success gave false confidence
    • Scaled up to main Downloads β†’ partial failure
    • Added complexity (OCR) β†’ complete breakdown
    • Each "smart" feature multiplied failure modes
    • Recovery required manual intervention

Success Patterns

  • Small Batches: Processing 20-30 files at a time worked
  • Clear Extensions: .jpg, .png consistently categorized
  • Simple Rules: Date-based archival was reliable
  • Duplicate Detection: Hash-based matching worked well

πŸ”¬ The Cross-Provider Learning Experiment

Methodology

I gave two different folders to Codex and Claude, then made them learn from each other's approaches. The evolution of their strategies revealed fundamental differences in how AI agents approach problems.

Codex's Approach (More Academic/Complex)

  • Machine Learning Focus: Proposed training classification models on file patterns
  • Learning Gates: Suggested implementing attention gates to maintain context
  • Pattern Recognition: Wanted to build neural networks for file type detection
  • Embedding Systems: Proposed vector embeddings for semantic file similarity
  • Reinforcement Learning: Suggested reward systems for correct categorization
  • Overly Complex: Often over-engineered solutions for simple problems

Claude's Approach (More Practical/Direct)

  • Rule-Based Systems: Started with deterministic rules before adding AI
  • Hierarchical Organization: Focused on folder tree structures
  • Metadata Extraction: Emphasized using file properties and timestamps
  • Batch Processing: Suggested chunking strategies for large folders
  • Fallback Mechanisms: Always included non-AI backup plans
  • Pragmatic: Preferred simple solutions that worked reliably

The Learning Process

When I shared Codex's ideas with Claude and vice versa:

  1. Claude + Codex Ideas: Claude simplified Codex's ML proposals into practical heuristics
  2. Codex + Claude Ideas: Codex tried to mathematically formalize Claude's rules
  3. Convergence Issues: They rarely agreed on implementation details
  4. Different Priorities: Codex optimized for accuracy, Claude for reliability

Key Insight

The two AIs had fundamentally different problem-solving philosophies:

  • Codex: "Let's build a system that learns"
  • Claude: "Let's build a system that works"

This divergence highlights a core challenge in vibe coding: different AI agents interpret the same request through vastly different lenses, making consistent results nearly impossible.

⚠️ Critical Lesson: The Scaling Trap

The Two-Environment Failure Pattern

Test Environment 1 (OneDrive Workspace - 579 files):

  1. Success at Small Scale: AI handled 579 files effectively
  2. Maintained Context: Could follow instructions and maintain organization
  3. Result: Usable system with 12 organized subdirectories

Test Environment 2 (Main Downloads - The Archive Explosion):

  1. Started Manageable: 2,800 files - initially within AI's capabilities
  2. Fatal Request: "Unzip downloads" - asked AI to extract archives
  3. Archive Explosion: File count exploded 5.4x (2,800 β†’ 15,323 files)
  4. Context Collapse: AI lost understanding of semantic file relationships from archives
  5. Total System Failure: Files became unfindable, all semantic groupings destroyed

The Progressive Failure Pattern

  1. Small Success: 100 files organized perfectly β†’ False confidence
  2. Medium Attempt: 500-2,800 files partially worked β†’ Pushed forward
  3. Feature Creep: Added OCR "smartness" β†’ Amplified existing problems
  4. Scale Disaster: Large folder (15,323 files) β†’ Complete system breakdown

The OCR Catastrophe (Test Environment 2)

When I asked AI to "make it smarter" with OCR:

  • Request: "Read PDFs, understand content, rename intelligently"
  • AI Response: Enthusiastically implemented OCR + NLP + Renaming
  • Catastrophic Results:
    • "Invoice_2023.pdf" β†’ "document_about_financial_transactions_possibly_related_to_business.pdf"
    • "Contract_Draft.pdf" β†’ "legal_text_with_multiple_parties_mentioned.pdf"
    • Version explosion: Company_A_2024-12-26_v1_v2_v3_[...through v36] (9).pdf
    • Automated categorization gone wrong: Random 20250824_AGR_, 20250824_PRES_, 20250824_TECH_ prefixes
    • Lost all original context and meaningful file names
    • Created hundreds of naming conflicts
    • Generated massive empty folder structures that AI couldn't clean up
    • Files became completely unfindable in the 15,323-file chaos

Why This Matters

The Vibe Coding Trap: AI's enthusiasm to help can destroy your data. Every "smart" feature is a potential catastrophe at scale. The same AI that brilliantly organizes 100 files will confidently destroy 15,323 files with equal enthusiasm, creating chaos that's nearly impossible to recover from manually.

Root Cause Analysis: The Job Fit Problem

Token Scale Mismatch: AI agents are fundamentally designed for code and limited text processing. Documents and OCR content require millions of tokens at scale - far beyond what current AI agents can efficiently handle. They resist or fail catastrophically when pushed beyond their optimal token ranges.

Instruction Following Breakdown: The massive failure wasn't just about scale - it revealed severe job fit issues. AI agents optimized for code understanding showed tremendous weakness in instruction following for document organization tasks. The tooling and specialization gap became apparent.

Specialization Requirements: This experiment proves that AI agents require:

  • Proper Task Specialization: File organization needs different AI architectures than code analysis
  • Well-Crafted Tooling: Generic AI agents need specialized tools and constraints for domain-specific tasks
  • Token-Aware Design: Systems must account for the token limitations and processing capabilities of the underlying AI
  • Archive Context Awareness: AI agents lack understanding that files from the same archive belong together semantically

The Archive Problem: AI agents have no concept that when you extract "ProjectAlpha.zip", all the resulting files maintain semantic relationships. They treat each extracted file as independent, destroying the original organizational intent embedded in the archive structure.

The Deeper Lesson: Vibe coding works when the AI agent's design matches the task domain. Using code-optimized agents for document management is like using a text editor to edit video - technically possible, spectacularly ineffective. The archive explosion disaster reveals that AI agents lack fundamental concepts about file relationships that humans take for granted.

πŸš€ Quick Start

Prerequisites

  • AI Provider: Install Claude CLI or Codex CLI for AI features
    • Claude: pip install claude-cli (or your Claude installation method)
    • Codex: npm install -g codex-cli (or your Codex installation method)
    • Set environment variables if needed:
      • VIBECLEANER_CLAUDE_CMD=claude
      • VIBECLEANER_CODEX_CMD=codex

Installation

# Install from PyPI
pip install vibecleaner

# Or install from source
git clone https://github.com/yourusername/vibecleaner.git
cd vibecleaner
pip install .

πŸ€– AI-Powered Usage (Recommended for Non-Technical Users)

# Initialize VibeCleaner
vibecleaner init

# Ask in natural language
vibecleaner ask "clean up my messy downloads folder"
vibecleaner ask "find and delete duplicate photos"
vibecleaner ask "organize PDFs from last month"
vibecleaner ask "what files are taking up the most space?"

# Interactive chat mode
vibecleaner chat
# Then chat naturally:
# You: my downloads are a mess, help!
# Assistant: I'll help you organize...

# Apply AI suggestions automatically
vibecleaner ask "remove old files" --apply

Manual Usage (Traditional Commands)

# Preview what will be cleaned (dry run)
vibecleaner clean --dry-run

# Clean downloads folder
vibecleaner clean

# Clean with specific rules
vibecleaner clean --older-than 30 --duplicates

# Watch folder for real-time organization
vibecleaner watch ~/Downloads

# Schedule automatic cleaning
vibecleaner schedule --daily --time 09:00

Configuration

Create a .vibecleaner.yml in your home directory:

# Download folder path (default: ~/Downloads)
downloads_path: ~/Downloads

# Organization rules
organize:
  Documents:
    extensions: [pdf, doc, docx, txt, odt]
    path: ~/Documents/Downloads
  Images:
    extensions: [jpg, jpeg, png, gif, svg, webp]
    path: ~/Pictures/Downloads
  Videos:
    extensions: [mp4, avi, mkv, mov, wmv]
    path: ~/Videos/Downloads
  Archives:
    extensions: [zip, rar, 7z, tar, gz]
    path: ~/Downloads/Archives

# Cleanup rules
cleanup:
  delete_after_days: 90
  archive_after_days: 30
  min_file_size: 1MB  # Ignore small files
  remove_duplicates: true
  
# Safety settings
safety:
  dry_run_default: true
  backup_before_delete: true
  whitelist_patterns:
    - "important_*"
    - "*.key"
    - "*.license"

πŸ› οΈ Advanced Features

Custom Rules

# Add custom organization rule
vibecleaner rule add --name "Screenshots" \
  --pattern "Screen Shot*" \
  --destination ~/Pictures/Screenshots

# Remove old downloads
vibecleaner clean --older-than 60d --min-size 100MB

# Find and remove duplicates
vibecleaner duplicates --remove --keep-newest

Scheduling (Cross-platform)

# Linux/Mac: Add to crontab
vibecleaner schedule --cron "0 9 * * *"

# Windows: Add to Task Scheduler  
vibecleaner schedule --windows --daily --time 09:00

# Run as daemon/service
vibecleaner daemon start

πŸ“Š Statistics & Reports

# Show cleaning statistics
vibecleaner stats

# Generate cleaning report
vibecleaner report --last-30-days

# Show disk space saved
vibecleaner savings

πŸ”§ Command Reference

Command Description
vibecleaner init Initialize configuration
vibecleaner clean Clean downloads folder
vibecleaner watch Watch folder for real-time organization
vibecleaner schedule Set up automatic cleaning
vibecleaner undo Undo last cleaning operation
vibecleaner stats Show cleaning statistics
vibecleaner config Edit configuration

πŸ›‘οΈ Safety & Privacy

  • No Cloud Dependency: Works entirely offline
  • Local Processing: Your files never leave your machine
  • Transparent Operations: Full logs of all actions
  • Reversible Changes: Undo support for recent operations
  • No Data Collection: Zero telemetry or analytics

πŸ“ License

Licensed under the MIT License. See the LICENSE file for details.

🀝 Call for Contributors: Help Make Vibe Coding Work

This project needs your help! The agentic workflow shows promise but requires significant improvement.

Areas Needing Help

  1. Context Management: Better handling of large folders (1000+ files)
  2. Attention Systems: Prevent AI from losing track during long operations
  3. Prompt Engineering: More robust prompts that work consistently
  4. Edge Cases: Handle unusual file types and folder structures
  5. Provider Integration: Better Claude/Codex integration, add more providers
  6. Learning Systems: Implement folder pattern recognition
  7. Testing: Test on diverse, real-world messy folders

How to Contribute

  1. Fork the repository
  2. Test on your own messy Downloads folder
  3. Document what breaks
  4. Submit fixes or improvements
  5. Share your experience with agentic workflows

Special Interest: If you've successfully implemented attention management or context preservation in agentic systems, your expertise would be invaluable.

See CONTRIBUTING.md for technical guidelines.

Further Reading

  • Deeper analysis and design recommendations: docs/REFLECTIONS.md
  • arXiv-style writeup of the experiment: docs/arxiv-draft.md

πŸŽ“ Lessons from Building with Vibe Coding

Key Insights

  1. Agents Need Boundaries: Unlimited scope leads to failure. Break tasks into small, well-defined chunks.

  2. Context is Everything: Current AI loses context quickly. Solution: Implement checkpoint systems.

  3. Trust but Verify: Always preview before execution. AI confidence doesn't equal correctness.

  4. Fallbacks are Essential: When AI fails (and it will), have deterministic rules ready.

  5. Prompt Management is Complex: What works once might fail next time. Version and test your prompts.

The Vibe Coding Reality

The Promise: Describe what you want, AI handles implementation.

The Reality: You need extensive infrastructure to make it reliable:

  • Attention management
  • Context preservation
  • Task decomposition
  • Error recovery
  • Safety guardrails

The Deeper Challenge: Different AI providers think fundamentally differently:

  • Same prompt β†’ Completely different architectures
  • No consensus on basic approaches
  • Cross-provider learning often makes things worse
  • Each AI's biases compound when combined

What This Means: The final VibeCleaner implementation is a hybrid of:

  • Codex's classification ideas (simplified to basic rules)
  • Claude's pragmatic file handling (with safety checks)
  • Manual fallbacks when both approaches fail
  • Cherry-picked working code from dozens of failed attempts

The Future: With community effort, we can build the missing infrastructure to make vibe coding practical. But we need to acknowledge that different AI agents may never fully align on implementation strategies.

πŸ’‘ Usage Tips

  • Start Small: Test on folders with <100 files first
  • Always Preview: Use --dry-run before any operation
  • Backup First: AI can misunderstand dramatically
  • Watch Mode: Better for real-time organization than bulk cleaning
  • Manual Fallback: When AI fails, use traditional commands

Important Note: This tool is experimental. VibeCleaner represents both the potential and current limitations of agentic AI workflows. Use with caution on important files. Always maintain backups.

About

Declutter your Downloads folder - auto-sort files into folders, delete old files, and keep your system clean

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •