Summary
Deleting artifact versions through ZenML's high-level client API appears to be pathologically slow when many unused artifact versions exist in a project.
The hot path seems to be Client._delete_artifact_version(...), which validates deletability using:
if artifact_version not in depaginate(
self.list_artifact_versions,
only_unused=True,
):
raise ValueError(...)
This means each individual deletion re-lists the full global set of unused artifact versions before performing the actual delete.
Why this is a problem
Instead of:
- fetch target
- validate target
- delete target
The current behavior is effectively:
- fetch target by ID
- list all unused artifact versions across the project
- check membership
- delete target
...and that happens once per deleted version.
This scales very poorly when many unused artifact versions exist.
Concrete evidence
In our REST-backed setup, deleting a single old artifact version took about 41.3s, while the actual DELETE request itself took about 0.1s.
Observed request pattern for 1 deletion:
GET /api/v1/artifact_versions: 176
GET /api/v1/artifact_versions/<id>: 1
DELETE /api/v1/artifact_versions/<id>: 1
For 2 deletions, the cost roughly doubled:
- wall time: ~83.7s
GET /api/v1/artifact_versions: 351
- actual DELETE calls: 2
So the behavior looks like “one full only_unused=True scan per delete”.
Relevant code path
In zenml/src/zenml/client.py:
def _delete_artifact_version(self, artifact_version):
if artifact_version not in depaginate(
self.list_artifact_versions,
only_unused=True,
):
raise ValueError(...)
self.zen_store.delete_artifact_version(artifact_version.id)
And depaginate(...) fetches every page.
Expected behavior
Deletion should validate the specific target artifact version much more narrowly.
For example, instead of depaginating the entire only_unused=True result set, ZenML could use a targeted query such as:
list_artifact_versions(id=<target_id>, only_unused=True, size=1)
or another backend-side existence check for just that artifact version.
That would preserve correctness while avoiding the repeated global scan.
Workaround
We implemented a Kitaru-side workaround by:
- doing a per-key preflight query for unused versions
- verifying the target IDs
- then calling the lower-level store delete directly
That reduced one deletion in the same environment from ~41.3s to ~0.64s.
So there seems to be a large performance win available upstream in ZenML's high-level delete path itself.
Suggested fix direction
Smallest likely fix:
- replace the global
depaginate(... only_unused=True) membership check with a targeted query for the specific artifact version ID
Longer-term:
- a batch delete API for artifact versions could help as well
Summary
Deleting artifact versions through ZenML's high-level client API appears to be pathologically slow when many unused artifact versions exist in a project.
The hot path seems to be
Client._delete_artifact_version(...), which validates deletability using:This means each individual deletion re-lists the full global set of unused artifact versions before performing the actual delete.
Why this is a problem
Instead of:
The current behavior is effectively:
...and that happens once per deleted version.
This scales very poorly when many unused artifact versions exist.
Concrete evidence
In our REST-backed setup, deleting a single old artifact version took about 41.3s, while the actual DELETE request itself took about 0.1s.
Observed request pattern for 1 deletion:
GET /api/v1/artifact_versions: 176GET /api/v1/artifact_versions/<id>: 1DELETE /api/v1/artifact_versions/<id>: 1For 2 deletions, the cost roughly doubled:
GET /api/v1/artifact_versions: 351So the behavior looks like “one full
only_unused=Truescan per delete”.Relevant code path
In
zenml/src/zenml/client.py:And
depaginate(...)fetches every page.Expected behavior
Deletion should validate the specific target artifact version much more narrowly.
For example, instead of depaginating the entire
only_unused=Trueresult set, ZenML could use a targeted query such as:or another backend-side existence check for just that artifact version.
That would preserve correctness while avoiding the repeated global scan.
Workaround
We implemented a Kitaru-side workaround by:
That reduced one deletion in the same environment from ~41.3s to ~0.64s.
So there seems to be a large performance win available upstream in ZenML's high-level delete path itself.
Suggested fix direction
Smallest likely fix:
depaginate(... only_unused=True)membership check with a targeted query for the specific artifact version IDLonger-term: