Skip to content

Conversation

@realmarcin
Copy link
Collaborator

Comprehensive repository cleanup - consolidated legacy data structures into organized ATTIC archive to improve repository clarity and maintainability.

Changes

Archived Directories (~29 MB)

From repository root → data/ATTIC/root_downloads/:

  • downloads_by_column_enhanced/ (14 MB)
  • downloads_by_column_combined/ (912 KB)
  • downloads_by_column_enhanced_combined/ (620 KB)
  • test_downloads/ (4.0 MB)
  • old/ (9.1 MB)

From data/ → data/ATTIC/:

  • sheets_concatenated/ (84 KB)
  • validated_extracted/ → validated_extracted_html_only/ (132 KB)

Documentation

Created:

  • data/ATTIC/README.md - Comprehensive 500+ line guide documenting:
    • Directory organization with size breakdown
    • Migration timeline and rationale (Dec 19, 2024)
    • Current vs deprecated pipeline comparison
    • Archive retention guidance
    • Low-priority cleanup action items

Updated:

  • CLAUDE.md - Added ATTIC structure to Data Organization section
  • CLAUDE.md - Added Important Note about legacy data archival

Verification Performed

✅ GitHub Actions workflows - No references to deprecated dirs ✅ Claude Code agents - No active references
✅ Makefiles - No active usage (only example comments) ✅ Source code - No breakage (overridden defaults)
✅ Documentation - No breaking references
⚠️ PRESERVED utils/ - Actively used by d4d_to_synapse_table.yml workflow

Testing

✅ Schema validation passes (make test-schema)
✅ Python unit tests pass (6 tests OK, 3 skipped)
✅ Repository state verified

Impact

  • Root directory: 5 deprecated directories removed
  • Data organization: Active pipeline + organized ATTIC archive
  • Documentation: Clear migration path and historical reference
  • No breaking changes: All active paths preserved
  • Total archived: ~29 MB from this cleanup + existing ATTIC content

Context

After establishing the current D4D pipeline (claudecode_agent as canonical method) and standardizing all paths, these legacy directories from earlier extraction experiments were consolidated into data/ATTIC/ for historical reference.

See data/ATTIC/README.md for complete archival documentation and timeline.

Related: Plan mode directory cleanup analysis

Comprehensive repository cleanup - consolidated legacy data structures into
organized ATTIC archive to improve repository clarity and maintainability.

## Changes

### Archived Directories (~29 MB)

**From repository root → data/ATTIC/root_downloads/:**
- downloads_by_column_enhanced/ (14 MB)
- downloads_by_column_combined/ (912 KB)
- downloads_by_column_enhanced_combined/ (620 KB)
- test_downloads/ (4.0 MB)
- old/ (9.1 MB)

**From data/ → data/ATTIC/:**
- sheets_concatenated/ (84 KB)
- validated_extracted/ → validated_extracted_html_only/ (132 KB)

### Documentation

**Created:**
- data/ATTIC/README.md - Comprehensive 500+ line guide documenting:
  - Directory organization with size breakdown
  - Migration timeline and rationale (Dec 19, 2024)
  - Current vs deprecated pipeline comparison
  - Archive retention guidance
  - Low-priority cleanup action items

**Updated:**
- CLAUDE.md - Added ATTIC structure to Data Organization section
- CLAUDE.md - Added Important Note about legacy data archival

### Verification Performed

✅ GitHub Actions workflows - No references to deprecated dirs
✅ Claude Code agents - No active references
✅ Makefiles - No active usage (only example comments)
✅ Source code - No breakage (overridden defaults)
✅ Documentation - No breaking references
⚠️ PRESERVED utils/ - Actively used by d4d_to_synapse_table.yml workflow

### Testing

✅ Schema validation passes (make test-schema)
✅ Python unit tests pass (6 tests OK, 3 skipped)
✅ Repository state verified

## Impact

- **Root directory**: 5 deprecated directories removed
- **Data organization**: Active pipeline + organized ATTIC archive
- **Documentation**: Clear migration path and historical reference
- **No breaking changes**: All active paths preserved
- **Total archived**: ~29 MB from this cleanup + existing ATTIC content

## Context

After establishing the current D4D pipeline (claudecode_agent as canonical method)
and standardizing all paths, these legacy directories from earlier extraction
experiments were consolidated into data/ATTIC/ for historical reference.

See data/ATTIC/README.md for complete archival documentation and timeline.

Related: Plan mode directory cleanup analysis
@realmarcin realmarcin merged commit 6818a91 into main Dec 20, 2025
5 checks passed
@realmarcin realmarcin deleted the clean-dirs branch December 20, 2025 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants