Skip to content

Commit 3f0d3ff

Browse files
committed
feat: add script for EM asset cleanup
- frees up space on the Comet instance by removing old assets
1 parent 53f6cb1 commit 3f0d3ff

File tree

7 files changed

+1227
-26
lines changed

7 files changed

+1227
-26
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,3 +103,8 @@ venv.bak/
103103

104104
# mypy
105105
.mypy_cache/
106+
data/*
107+
108+
# pytest
109+
.pytest_cache/
110+
MagicMock/

.pre-commit-config.yaml

Lines changed: 0 additions & 26 deletions
This file was deleted.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: v6.0.0
4+
hooks:
5+
- id: trailing-whitespace
6+
- id: end-of-file-fixer
7+
- id: check-yaml
8+
- id: check-toml
9+
- id: check-added-large-files
10+
args: ["--maxkb=10000"]
11+
- id: check-merge-conflict
12+
- id: detect-private-key
13+
- id: check-case-conflict
14+
- id: mixed-line-ending
15+
16+
- repo: https://github.com/astral-sh/ruff-pre-commit
17+
rev: v0.12.12
18+
hooks:
19+
- id: ruff-format
20+
- id: ruff
21+
args: [--fix, --exit-non-zero-on-fix]

rest_api/asset_cleanup/README.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
### Comet Asset Cleanup Script
2+
3+
A safe, automatable Python utility to reduce cloud storage by identifying and deleting old experiment assets in Comet.
4+
5+
This script can:
6+
- Iterate experiments in a workspace (single project or all projects)
7+
- List assets per experiment and evaluate age
8+
- Save a JSON "deletion plan" before any destructive action
9+
- Delete assets older than a threshold (opt-in, with confirmation)
10+
11+
By default, the script runs in dry-run mode and never deletes anything unless you pass `--execute`.
12+
13+
---
14+
15+
### Requirements
16+
- Python 3.9+
17+
- Poetry (recommended)
18+
- Comet API key with access to the target workspace/projects
19+
20+
Environment variables (recommended via `.env`):
21+
- `COMET_API_KEY` (required)
22+
- `COMET_WORKSPACE` (optional if passed via `--workspace`)
23+
- `COMET_PROJECT` (optional; omit to process all projects)
24+
25+
---
26+
27+
### Setup
28+
1) Install dependencies (project uses Poetry):
29+
30+
```bash
31+
poetry install --no-root
32+
```
33+
34+
2) Create a `.env` in the repo root (or export env vars in your shell):
35+
36+
```bash
37+
echo "COMET_API_KEY=your_api_key_here" >> .env
38+
echo "COMET_WORKSPACE=your_workspace_name" >> .env
39+
# COMET_PROJECT is optional; omit to process all projects
40+
```
41+
42+
---
43+
44+
### Usage
45+
General form:
46+
47+
```bash
48+
poetry run python scripts/comet_asset_cleanup.py \
49+
--workspace <WORKSPACE> \
50+
[--project <PROJECT>] \
51+
[--days-threshold 365] \
52+
[--include-archived] \
53+
[--execute] [--yes] \
54+
[--batch-size 10] [--delay 1.0]
55+
```
56+
57+
Flags:
58+
- `--workspace`: Workspace to process (required if not set in env)
59+
- `--project`: Project to process; omit to process ALL projects in the workspace
60+
- `--days-threshold`: Assets older than this many days are targeted (default: 365)
61+
- `--include-archived`: Include archived experiments
62+
- `--dry-run`: Force simulation mode (default behavior)
63+
- `--execute`: Perform deletions (requires confirmation unless `--yes`)
64+
- `--yes`: Skip interactive confirmation (non-interactive/scheduled runs)
65+
- `--batch-size`: Number of deletions per batch (default: 10)
66+
- `--delay`: Seconds to sleep between batches (default: 1.0)
67+
68+
Examples:
69+
70+
Dry-run, one project:
71+
```bash
72+
poetry run python comet_asset_cleanup.py \
73+
--workspace your-ws --project your-proj --days-threshold 365
74+
```
75+
76+
Dry-run, all projects in a workspace:
77+
```bash
78+
poetry run python comet_asset_cleanup.py \
79+
--workspace your-ws --days-threshold 365
80+
```
81+
82+
Execute, include archived, non-interactive:
83+
```bash
84+
poetry run python comet_asset_cleanup.py \
85+
--workspace your-ws --project your-proj --days-threshold 365 \
86+
--include-archived --execute --yes
87+
```
88+
89+
---
90+
91+
### What gets saved
92+
- A deletion plan JSON is always written before any deletion (also in dry runs):
93+
94+
- File name pattern: `asset_deletion_plan_<workspace>_<project>_<timestamp>.json`
95+
- Structure:
96+
97+
```json
98+
{
99+
"my-workspace": {
100+
"my-project": {
101+
"abcd1234_experimentkey": [
102+
"gradient_layer3.4_weight.json",
103+
"..."
104+
]
105+
}
106+
}
107+
}
108+
```
109+
110+
- Logs are written to `comet_asset_cleanup.log` and printed to console.
111+
112+
---
113+
114+
### How it works (Comet REST API)
115+
The script uses Comet REST endpoints to discover projects, experiments, and assets, and to delete assets:
116+
- Get Projects (used when `--project` is omitted):
117+
- [Get Projects](https://www.comet.com/docs/v2/api-and-sdk/rest-api/read-endpoints/#get-projects)
118+
- Get Experiments in a project:
119+
- [Get Experiments](https://www.comet.com/docs/v2/api-and-sdk/rest-api/read-endpoints/#get-experiments)
120+
- List assets for an experiment:
121+
- [Get Asset List](https://www.comet.com/docs/v2/api-and-sdk/rest-api/read-endpoints/#get-asset-list)
122+
- Delete an asset:
123+
- [Delete an Asset](https://www.comet.com/docs/v2/api-and-sdk/rest-api/write-endpoints/#delete-an-asset)
124+
125+
The script computes the threshold date/time and selects assets where `createdAt` is older than the given threshold.
126+
127+
---
128+
129+
### Safety and scheduling
130+
- Default behavior is non-destructive dry run.
131+
- In execute mode, the script saves a deletion plan JSON and requires a console confirmation (`DELETE`) unless `--yes`.
132+
- Consider scheduling as a CRON job:
133+
134+
```cron
135+
# Run daily at 2:00am in execute mode across all projects
136+
0 2 * * * cd /path/to/repo && \
137+
/usr/local/bin/poetry run python scripts/comet_asset_cleanup.py \
138+
--workspace your-ws --days-threshold 365 --execute --yes >> cleanup.cron.log 2>&1
139+
```
140+
141+
---
142+
143+
### Notes
144+
- API rate limits: adjust `--batch-size` and `--delay` if needed.
145+
- Authentication: the script expects `Authorization` header with your API key (set via `COMET_API_KEY`).
146+
- Project discovery requires `Get Projects` permissions in the workspace.
147+
- Test with a small `--days-threshold` and in dry run before enabling execute mode.
148+
149+
---
150+
151+
### Support
152+
If you need help adapting the script to your environment or adding filters (e.g., by asset type), reach out to your Comet contact.

0 commit comments

Comments
 (0)