|
| 1 | +### Comet Asset Cleanup Script |
| 2 | + |
| 3 | +A safe, automatable Python utility to reduce cloud storage by identifying and deleting old experiment assets in Comet. |
| 4 | + |
| 5 | +This script can: |
| 6 | +- Iterate experiments in a workspace (single project or all projects) |
| 7 | +- List assets per experiment and evaluate age |
| 8 | +- Save a JSON "deletion plan" before any destructive action |
| 9 | +- Delete assets older than a threshold (opt-in, with confirmation) |
| 10 | + |
| 11 | +By default, the script runs in dry-run mode and never deletes anything unless you pass `--execute`. |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +### Requirements |
| 16 | +- Python 3.9+ |
| 17 | +- Poetry (recommended) |
| 18 | +- Comet API key with access to the target workspace/projects |
| 19 | + |
| 20 | +Environment variables (recommended via `.env`): |
| 21 | +- `COMET_API_KEY` (required) |
| 22 | +- `COMET_WORKSPACE` (optional if passed via `--workspace`) |
| 23 | +- `COMET_PROJECT` (optional; omit to process all projects) |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +### Setup |
| 28 | +1) Install dependencies (project uses Poetry): |
| 29 | + |
| 30 | +```bash |
| 31 | +poetry install --no-root |
| 32 | +``` |
| 33 | + |
| 34 | +2) Create a `.env` in the repo root (or export env vars in your shell): |
| 35 | + |
| 36 | +```bash |
| 37 | +echo "COMET_API_KEY=your_api_key_here" >> .env |
| 38 | +echo "COMET_WORKSPACE=your_workspace_name" >> .env |
| 39 | +# COMET_PROJECT is optional; omit to process all projects |
| 40 | +``` |
| 41 | + |
| 42 | +--- |
| 43 | + |
| 44 | +### Usage |
| 45 | +General form: |
| 46 | + |
| 47 | +```bash |
| 48 | +poetry run python scripts/comet_asset_cleanup.py \ |
| 49 | + --workspace <WORKSPACE> \ |
| 50 | + [--project <PROJECT>] \ |
| 51 | + [--days-threshold 365] \ |
| 52 | + [--include-archived] \ |
| 53 | + [--execute] [--yes] \ |
| 54 | + [--batch-size 10] [--delay 1.0] |
| 55 | +``` |
| 56 | + |
| 57 | +Flags: |
| 58 | +- `--workspace`: Workspace to process (required if not set in env) |
| 59 | +- `--project`: Project to process; omit to process ALL projects in the workspace |
| 60 | +- `--days-threshold`: Assets older than this many days are targeted (default: 365) |
| 61 | +- `--include-archived`: Include archived experiments |
| 62 | +- `--dry-run`: Force simulation mode (default behavior) |
| 63 | +- `--execute`: Perform deletions (requires confirmation unless `--yes`) |
| 64 | +- `--yes`: Skip interactive confirmation (non-interactive/scheduled runs) |
| 65 | +- `--batch-size`: Number of deletions per batch (default: 10) |
| 66 | +- `--delay`: Seconds to sleep between batches (default: 1.0) |
| 67 | + |
| 68 | +Examples: |
| 69 | + |
| 70 | +Dry-run, one project: |
| 71 | +```bash |
| 72 | +poetry run python comet_asset_cleanup.py \ |
| 73 | + --workspace your-ws --project your-proj --days-threshold 365 |
| 74 | +``` |
| 75 | + |
| 76 | +Dry-run, all projects in a workspace: |
| 77 | +```bash |
| 78 | +poetry run python comet_asset_cleanup.py \ |
| 79 | + --workspace your-ws --days-threshold 365 |
| 80 | +``` |
| 81 | + |
| 82 | +Execute, include archived, non-interactive: |
| 83 | +```bash |
| 84 | +poetry run python comet_asset_cleanup.py \ |
| 85 | + --workspace your-ws --project your-proj --days-threshold 365 \ |
| 86 | + --include-archived --execute --yes |
| 87 | +``` |
| 88 | + |
| 89 | +--- |
| 90 | + |
| 91 | +### What gets saved |
| 92 | +- A deletion plan JSON is always written before any deletion (also in dry runs): |
| 93 | + |
| 94 | + - File name pattern: `asset_deletion_plan_<workspace>_<project>_<timestamp>.json` |
| 95 | + - Structure: |
| 96 | + |
| 97 | +```json |
| 98 | +{ |
| 99 | + "my-workspace": { |
| 100 | + "my-project": { |
| 101 | + "abcd1234_experimentkey": [ |
| 102 | + "gradient_layer3.4_weight.json", |
| 103 | + "..." |
| 104 | + ] |
| 105 | + } |
| 106 | + } |
| 107 | +} |
| 108 | +``` |
| 109 | + |
| 110 | +- Logs are written to `comet_asset_cleanup.log` and printed to console. |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +### How it works (Comet REST API) |
| 115 | +The script uses Comet REST endpoints to discover projects, experiments, and assets, and to delete assets: |
| 116 | +- Get Projects (used when `--project` is omitted): |
| 117 | + - [Get Projects](https://www.comet.com/docs/v2/api-and-sdk/rest-api/read-endpoints/#get-projects) |
| 118 | +- Get Experiments in a project: |
| 119 | + - [Get Experiments](https://www.comet.com/docs/v2/api-and-sdk/rest-api/read-endpoints/#get-experiments) |
| 120 | +- List assets for an experiment: |
| 121 | + - [Get Asset List](https://www.comet.com/docs/v2/api-and-sdk/rest-api/read-endpoints/#get-asset-list) |
| 122 | +- Delete an asset: |
| 123 | + - [Delete an Asset](https://www.comet.com/docs/v2/api-and-sdk/rest-api/write-endpoints/#delete-an-asset) |
| 124 | + |
| 125 | +The script computes the threshold date/time and selects assets where `createdAt` is older than the given threshold. |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +### Safety and scheduling |
| 130 | +- Default behavior is non-destructive dry run. |
| 131 | +- In execute mode, the script saves a deletion plan JSON and requires a console confirmation (`DELETE`) unless `--yes`. |
| 132 | +- Consider scheduling as a CRON job: |
| 133 | + |
| 134 | +```cron |
| 135 | +# Run daily at 2:00am in execute mode across all projects |
| 136 | +0 2 * * * cd /path/to/repo && \ |
| 137 | + /usr/local/bin/poetry run python scripts/comet_asset_cleanup.py \ |
| 138 | + --workspace your-ws --days-threshold 365 --execute --yes >> cleanup.cron.log 2>&1 |
| 139 | +``` |
| 140 | + |
| 141 | +--- |
| 142 | + |
| 143 | +### Notes |
| 144 | +- API rate limits: adjust `--batch-size` and `--delay` if needed. |
| 145 | +- Authentication: the script expects `Authorization` header with your API key (set via `COMET_API_KEY`). |
| 146 | +- Project discovery requires `Get Projects` permissions in the workspace. |
| 147 | +- Test with a small `--days-threshold` and in dry run before enabling execute mode. |
| 148 | + |
| 149 | +--- |
| 150 | + |
| 151 | +### Support |
| 152 | +If you need help adapting the script to your environment or adding filters (e.g., by asset type), reach out to your Comet contact. |
0 commit comments